Methods for genetic analysis of DNA using biased amplification of polymorphic sites

ABSTRACT

Methods for determing genotypes and haplotypes of genes are described. Also described are single nucleotide polymorphisms and haplotypes in the ApoE gene and methods of using that information.

RELATED APPLICATION

[0001] This application claims the benefit of Stanton et al., U.S.Provisional Application No. 60/206,613, filed May 23, 2000, entitledMETHODS FOR GENETIC ANALYSIS OF DNA, which is hereby incorporated byreference in its entirety, including drawings.

BACKGROUND OF THE INVENTION

[0002] This application describes methods for the genetic analysis ofbiologically, medically and economically significant traits in mammalsand other organisms, including humans. Genetic analysis refers to thedetermination of the nucleotide sequence of a gene or genes of interestin a subject organism, including methods for analysis of one site ofsequence variation (i.e. genotyping methods) and methods for analysis ofa collection of sequence variations (haplotyping methods). Geneticanalysis further includes methods for correlating sequence variationwith disease risk, diagnosis, prognosis or therapeutic management.

[0003] The use of novel genotyping and haplotyping methods for geneticanalysis of the apolipoprotein E (ApoE) gene are described. Thesemethods entail use of novel ApoE DNA sequence polymorphisms andhaplotypes. The ApoE alleles and genetic analysis methods of thisapplication will allow more sensitive measurement of the contribution ofApoE genetic variation to medically important phenotypes such as risk ofheart disease, risk of Alzheimer's disease and response to varioustherapeutic interventions, including pharmacotherapy.

[0004] This application also describes new methods for genotyping a DNAsample based on analysis of the mass of cleaved DNA fragments using massspectrometry. These genotyping methods are better suited to the presentand future requirements of DNA testing than current genotyping methodsas a result of improved accuracy, decreased set-up and reagent costs,reduced complexity and excellent compatibility with automation.

[0005] At present, DNA diagnostic testing is largely concerned withidentification of rare polymorphisms related to Mendelian traits. Thesetests have been in use for well over a decade. In the future genetictesting will come into much wider clinical and research use, as a meansof making predictive, diagnostic, prognostic and pharmacogeneticassessments. These new genetic tests will in many cases involvemultigenic conditions, where the correlation of genotype and phenotypeis significantly more complex than for Mendelian phenotypes. To producegenetic tests with the requisite accuracy will require new methods thatcan simultaneously track multiple DNA sequence variations at low costand high speed, without compromising accuracy. Many tests will beevaluated in the clinical research setting but only a small fractionwill become major diagnostic tests; the clinical research process willreveal that most polymorphisms lack significant functional effects. Thegenetic analysis methods described in this application are relativelyinexpensive to set up and run, while providing extremely high accuracy,and, most important, enabling sophisticated genetic analysis. They aretherefore optimally suited to the exigencies of genetic test developmentin coming years.

[0006] The association of specific genotypes with disease risk,prognosis, and diagnosis as well as selection of optimal therapy fordisease are some of the benefits expected to ensue from the human genomeproject. At present, the most common type of genetic study design fortesting the association of genotypes with medically important phenotypesis a case control study where allele frequencies are measured in one ormore phenotypically defined groups of cases and compared to allelefrequencies in controls. (Alternatively, phenotype frequencies in two ormore genotypically defined groups are compared.) The majority of suchpublished genetic association studies have focused on measuring thecontribution of a single polymorphic site (usually a single nucleotidepolymorphism, abbreviated SNP) to variation in a medically importantphenotype or phenotypes. In these studies one polymorphism serves as aproxy for all variation in a gene (or even a cluster of adjacent genes).

[0007] The limitations of such single polymorphism association analysisare becoming increasingly apparent. Recent articles (e.g. Terwilliger,J. and K. M Weiss. Linkage disequilibrium mapping of complex disease:fantasy or reality? Current Opinion in Biotechnology 9: 578-594, 1998)have drawn attention to the low quality of most association studiesusing single polymorphic sites (evidenced by their low degree ofreproducibility). Some of the reasons for the lack of reproducibility ofmany association studies are apparent. In particular, the extent ofhuman DNA polymorphism—most genes contain 10 or more polymorphic sites,and many genes contain over 100 polymorphic sites—is such that a singlepolymorphic site can only rarely serve as a reliable proxy for allvariation in a gene (which typically covers at least several thousandnucleotides and can extend over 1,000,000 nucleotides). Even in caseswhere one polymorphic site is responsible for significant biologicalvariation, there is no reliable method for identifying such a site. Thehaplotyping and genetic analysis methods described in this applicationprovide a systematic way to identify such polymorphic sites.

[0008] Several recent studies have begun to outline the extent of humanmolecular genetic variation. For example, a comprehensive survey ofgenetic variation in the human lipoprotein lipase (LPL) gene (Nickerson,D. A., et al. Nature Genetics 19: 233-240, 1998; Clark, A. G., et al.American Journal of Human Genetics 63: 595-612, 1998) compared 71 humansubjects and found 88 varying sites in a 9.7 kb region. On average anytwo versions of the gene differed at 17 sites. This and other studiesshow that sequence variation may be present at approximately 1 in 100nucleotides when 50 to 100 unrelated subjects are compared. Theimplications of the this data are that, in order to create geneticdiagnostic tests of sufficient specificity and selectivity to justifywidespread medical use, more sophisticated methods are needed formeasuring human genetic variation.

[0009] Beyond tests that measure the status of a single polymorphicsite, the next level of sophisication in genetic testing is to genotypetwo or more polymorphic sites and keep track of the genotypes at each ofthe polymorphic sites when calculating the association between genotypesand phenotypes (e.g. using multiple regression methods). However, thisapproach, while an improvement on the single polymorphism method interms of considering possible interactions between polymorphisms, islimited in power as the number of polymorphic sites increases. Thereason is that the number of genetic subgroups that must be comparedincreases exponentially as the number of polymorphic sites increases. Ina medical study of fixed size this has the effect of dramaticallyincreasing the number of groups that must be compared, while reducingthe size of each subgroup to a small number. The consequence of theseeffects is an unacceptable loss of statistical power. Consider, forexample, a clinical study of a gene that contains 10 variable sites. Ifeach site is biallelic then there are 2¹⁰=1024 possible combinations ofpolymorphic sites. If the study population is 500 subjects then it islikely that many genetically defined subgroups will contain only a smallnumber of subjects. Thus, consideration of multiple polymorphisms (ascan be determined from DNA sequence data, for example) does not get atthe problem that the DNA sequence from a diploid subject does notsufficiently constrain the sequence of the subject's two chromosomes tobe very useful for statistical analysis. Only direct determination ofthe DNA sequence on each chromosome (a haplotype) can constrain thenumber of genetic variables in each subject to two (allele 1 and allele2), while accounting for all, or preferably at least a substantialsubset of, the polymorphisms.

[0010] A much more powerful measure of variation in a DNA segment, then,is a haplotype—that is, the set of polymorphisms that are found on asingle chromosome. Because of the evolutionary history of humanpopulations, only a small fraction of all possible haplotypes (given aset of polymorphic sites at a locus) actually occur at appreciablefrequency. For example, in a gene with 10 polymorphic sites only a smallfraction—perhaps in the range of 1%—of the 1,024 possible genotypes islikely to exist at a frequency greater than 5% in a human population.Further, as described below, haplotypes can be clustered in groups ofrelated sequences to facilitate genetic analysis. Thus determination ofhaplotypes is a simplifying step in performing a genetic associationstudy (compared to the analysis of multiple polymorphisms), particularlywhen applied to DNA segments characterized by many polymorphic sites.There is also a potent biological rationale for sorting genes byhaplotype, rather than by genotype at one polymorphic site: polymorphicsites on the same chromosome may interact in a specific way to determinegene function. For example, consider two sites of polymorphism in agene, both of which encode amino acid changes. The two polymorphicresidues may lie in close proximity in three dimensional space (i.e. inthe folded structure of the encoded protein). If one of the polymorphicamino acids encoded at each of the two sites has a bulky side chain andthe other a small side chain then one can imagine a situation in whichproteins that have either [bulky-small], [small-bulky] or [small-small]pairs of polymorphic residues are fully functional, but proteins with[bulky-bulky] residues at the two sites are impaired, on account of adisruptive shape change caused by the interaction of the two bulky sidegroups. Now consider a subject whose genotype is heterozygousbulky/small at both polymorphic sites. The possible haplotype pairs insuch a subject are [bulky-small]/[small-bulky], or[small-small]/[bulky-bulky]. The functional implications of these twohaplotype pairs are quite different: active/active or active/inactive,respectively. A genotype test would simply reveal that the subject isdoubly heterozygous. Only a haplotype test would reveal the biologicallyconsequential structure of the variation. The interaction of polymorphicsites need not involve amino acid changes, of course, but could alsoinvolve virtually any combination of polymorphic sites.

[0011] The genetic analysis of complex traits can be made still morepowerful by use of schemes to cluster haplotypes into related groupsbased on parsimony, for example. Templeton and coworkers havedemonstrated the power of cladograms for analysis of haplotype data.(Templeton, A. R., Boerwinkle, E. and C. F. Sing. A Cladistic Analysisof Phenotypic Associations With Haplotypes Inferred From RestrictionEndonuclease Mapping. I. Basic Theory and an Analysis of AlcoholDehydrogenase Activity in Drosophila Genetics 117: 343-351, 1987.Templeton, A. R., Crandall, K. A. and C. F. Sing. A Cladistic Analysisof Phenotypic Associations With Haplotypes Inferred From RestrictionEndonuclease Mapping and DNA Sequence Data. III. Cladogram EstimationGenetics 132: 619-633, 1992. Templeton, A. R. and C. F. Sing. ACladistic Analysis of Phenotypic Associations With Haplotypes InferredFrom Restriction Endonuclease Mapping. IV. Nested Analyses withCladogram Uncertainty and Recombination. Genetics 134: 659-669, 1993.Templeton A. R., Clark A. G., Weiss K. M., Nickerson D. A., BoerwinkleE. and C. F. Sing. Recombinational and mutational hotspots within thehuman lipoprotein lipase gene. Am J Hum Genet. 66: 69-83, 2000). Theseanalyses describe a set of rules for clustering haplotypes intohierarchical groups based on their presumed evolutionary relatedness.This phylogenetic trees can be constructed using standard softwarepackages for phylogenetic analysis such as PHYLIP or PAUP (Felsenstein,J. Phylogenies from molecular sequences: inference and reliability. AnnuRev Genet. 22:521-65, 1988; Retief, J. D. Phylogenetic analysis usingPHYLIP. Methods Mol. Biol. 132:243-58, 2000), and hierarchical haplotypeclustering can be accomplished using the rules described by Templetonand co-workers. The methods described by Templeton and colleaguesfurther provide for a nested analysis of variance between differenthaplotype groups at each level of clustering. The results of thisanalysis can lead to identification of polymorphic sites responsible forphenotypic variation, or at a minimum narrow the possible phenotypicallyimportant sites. Thus, methods for determination of haplotypes havegreat utility in studies designed to test association between geneticvariation and variation in phenotypes of medical interest, such asdisease risk and prognosis and response to therapy.

[0012] Currently available methods for the experimental determination ofhaplotypes are unsatisfactory, particularly methods for thedetermination of haplotypes over long distances (e.g. >5 kb). One of thefew experimental haplotyping methods currently in use outside theresearch group that devised it is based on allele specific amplificationusing oligonucleotide primers that terminate at polymorphic sites(Newton, C. R. et al. Amplification refractory mutation system forprenatal diagnosis and carrier assessment in cystic fibrosis. Lancet.Dec 23-30; 2 (8678-8679):1481-3, 1989; Newton, C. R. et al., Analysis ofany point mutation in DNA. The amplification refractory mutation system(ARMS) Nucleic Acids Res. Vol. 17, 2503-2516, 1989). The method isreferred to by the acronym ARMS (for amplification refractory mutationsystem). The ARMS system was subsequently further developed (Lo, Y. M.et al., Direct haplotype determination by double ARMS: specificity,sensitivity and genetic applications. Nucleic Acids Research July 11;19(13):3561-7, 1991) and has since been used in a number of other studies.ARMS is the subject of U.S. Pat. Nos. 5,595,890 and 5,853,989. Thedrawbacks of this method are that (i) the usual limitations of PCR applyin terms of the difficulty of amplifying long DNA segments; (ii) duringamplification cycles, an incompletely extended primer extension productmay switch (between one or more cycles) from one allelic template strandto the other, resulting in artefactual hybrid haplotypes; (iii) becausedifferent DNA samples will be heterozygous at different combinations ofnucleotides, different primers and assay conditions for allele specificamplification must be established for each polymorphic site that is tobe haplotyped. For example, consider a locus with five polymorphicsites. Subject A is heterozygours at sites 1, 2 and 4; subject B atsites 2 and 3, and subject C at sites 3 and 5. To haplotype A requiresallele specific amplification conditions from sites 1 or 4; to haplotypeB requires allele specific amplification conditions from sites 2 or 3,and to haplotype C requires allele specific amplification conditionsfrom sites 3 or 5 (with the allele specific primer from site 3 on theopposite strand from that used to haplotype B).

[0013] A similar method for achieving allele specific amplificationtakes advantage of some thermostable polymerases' ability to proofreadand remove a mismatch at the 3′ end of a primer. Again, primers aredesigned with the 3′ terminal base positioned opposite to the variantbase in the template. In this case the 3′ base of the primer is modifiedin a way that prevents it from being extended by the 5′-3′ polymeraseactivity of a DNA polymerase. Upon hybridization of the end-blockedprimer to the complementary template sequence, the 3′ base is eithermatched or mismatched, depending on which alleles are present in thesample. If the 3′ base of the primer is properly base paired thepolymerase does not remove it from the primer and thus the blocked 3′end remains intact and the primer can not be extended. However, if thereis a mismatch between the 3′ end of the primer and the template, thenthe 3′-5′ proofreading activity of the polymerase removes the blockedbase and then the primer can be extended and amplification occurs. Thismethod suffers from the same limitations described above for the ARMSprocedure.

[0014] Other allele specific PCR amplification methods include furthermethods in which the 3′ terminal primer forms a match with one alleleand a mismatch with the other allele (U.S. Pat. No. 5,639,611), PCRamplification and analysis of intron sequences (U.S. Pat. No. 5,612,179and U.S. Pat. No. 5,789,568), or amplification and identification ofpolymorphic markers in a chromosomal region of DNA (U.S. Pat. No.5,851,762). Further, methods for allele-specific reverse transcriptionand PCR amplification to detect mutations (U.S. Pat. No. 5,804,383), anda primer-specific and mispair extension assay to detect mutations orpolymorphisms (PCT/CA99/00733) have been described. Several of thesemethods are directed to genotyping, not to haplotyping.

[0015] Other haplotyping methods that have been described are based onanalysis of single sperm cells (Hubert R., Stanton, V. P. Jr, AburataniH, et al. Sperm typing allows accurate measurement of the recombinationfraction between D3S2 and D3S3 on the short arm of human chromosome 3.Genomics. April 1992;12(4):683-687); on limiting dilution of a DNAsample (until only one template molecule is present in each test tube,on average) (Ruano, G., Kidd, K. K. and J. C. Stephens. Haplotype ofmultiple polymorphisms resolved by enzymatic amplification of single DNAmolecules. Proc Natl Acad Sci USA August 1990;87(16):6296-6300), or oncloning DNA into various vectors and host microorganisms (U.S. Pat. No.5,972,614). These methods are not practical for clinical studies ofhuman subjects, and generally have not been used in studies of humandisease risk or drug response. For example, sperm based haplotypingmethods are not generally useful for clinical studies because no spermhas the same haplotype as its host. Limiting dilution methods aretechnically challenging—two rounds of PCR amplification are required,with stringent controls for preventing contamination by exogenousDNA—and not compatible with the high throughput, accuracy andreliability required in human clinical studies.

SUMMARY OF THE INVENTION

[0016] This invention concerns methods for determining the sequence of aDNA sample at a polymorphic site, often referred to as genotyping. Manygenotyping methods are known in the art, however the methods describedin this application have the advantages of being robust, highlyaccurate, and inexpensive to set up and perform. For these reasons themethods described herein are preferable to currently available methods.The genotyping methods described in the specification may be used in thegenotyping steps of the haplotyping methods of this invention, or theymay be used for genotyping alone, i.e. not associated with a haplotypingtest.

[0017] The present invention also concerns methods for determining theorganization of DNA sequence polymorphisms on individualchromosomes—i.e. haplotypes, as well as methods for using eithergenotype or haplotype information, or a combination of the two, to makediagnostic tests useful for disease risk assessment, for prognosticprediction of the course or outcome of a disease, to diagnose a diseaseor condition, or to select optimal therapy for a disease or condition.As described above, haplotypes are often not directly inferrable fromgenotypes, therefore specialized methods are required to determinehaplotypes. Further, as noted, currently available haplotyping methodsare cumbersome and/or are limited by the type of samples that can beanalyzed. The several haplotyping methods of this invention are superiorto previously described methods with respect to technical ease, samplethroughput, length of DNA that can be haplotyped, and compatibility withautomation. These novel methods provide the basis for more sophisticatedanalyses of the contribution of variation at candidate genes (such asApoE) to intersubject variation in medical or other phenotypes ofinterest. These methods are applicable to patients with a disease ordisorder as well as to apparently normal subjects in whom apredisposition to a disease or disorder may be discovered or quantifiedas a result of a haplotyping test described herein. Application of thehaplotyping methods of this invention will provide for improved medicalcare by increasing the accuracy of genetic diagnostic tests of allkinds.

[0018] This invention further concerns genetic analysis of the Apo Egene to determine disease and drug response traits in humans,particularly traits that may be affected by genetic variation at theApoE gene, and further concerns methods for improving medical care forindividual patients based on the results of ApoE genetic testing.Variation at the ApoE gene has been associated with risk of Alzheimer'sdisease and other neurodegenerative diseases, recovery from organic ortraumatic brain injury, and response to pharmacotherapy of AD as well ascoronary heart disease, dyslipidemia, and other conditions. The methodsof this application also provide for more efficient use of medicalresources, and therefore are also of use to organizations that pay forhealth care, such as managed care organizations, health insurancecompanies and the federal government. The invention provides methods forperforming genotyping and haplotyping tests on a human subject toformulate or assist in the formulation of a diagnosis, a prognosis orthe selection of an optimal treatment method based on ApoE genotype orhaplotype. These methods are applicable to patients with a disease ordisorder affecting the cardiovascular or nervous systems, as well aspatients with any disease or disorder that is affected by lipidmetabolism. The ApoE haplotyping methods of this invention are equallyapplicable to apparently normal subjects in whom predisposition to adisease or disorder may be discovered as a result of an ApoE genotypingor haplotyping test described herein. Application of the methods of thisinvention will provide for improved medical care by, for example,allowing early implementation of preventive measures in patients at riskof diseases such as atherosclerosis, dementia, Parkinson's disease,Huntington's disease or other organic or vascular neurodegenerativeprocess; or optimal selection of therapy for patients with diseases orconditions such as hyperlipidemia, cardiovascular disease (includingcoronary heart disease as well as peripheral or central nervous systematherosclerosis), neurological diseases including but not limited toAlzheimer's disease, stroke, head or brain trauma, amyotrophic lateralsclerosis, and psychiatric diseases such as psychosis, bipolar diseaseand depression.

[0019] Genotyping Methods

[0020] The disadvantages of existing genotyping methods include unprovenor inadequate accuracy (particularly for medical research or clinicalpractice, where very high accuracy is required), high set up costs(which are unacceptable when relatively small numbers of subjects arebeing studied—e.g. in the clinical research setting), technicaldifficulty in performing the test or interpreting the results, andincompatibility with full automation.

[0021] Methods described in the present invention first useamplification (preferably PCR amplification) using amplificationoligonucleotides (primers) flanking a polymorphic site. The 3′ end ofone of the primers is close, highly preferably within 16 nucloetides, ofa polymorphic site in template DNA. The second primer may lie at anydistance from the first primer on the opposite side of the polymorphicsite providing effective amplification. The first primer is designed sothat it introduces two restriction endonuclease recognition sites intothe amplified product during the amplification process. Preferably thetwo restriction sites are created by inserting a sequence of 15 or fewernucleotides into the primer. This short inserted sequence in generaldoes not base pair to the template strand, but rather loops out when theprimer is bound to template. However, when the complementary strand iscopied by polymerase the inserted sequence is incorporated into theamplicon. Incubation of the resulting amplification product with theappropriate restriction endonucleases results in the excision of a small(generally <20 bases) polynucleotide fragment that contains thepolymorphic nucleotide. The small size of the excised fragment allows itto be easily and robustly analyzed by mass spectrometry to determine theidentity of the base at the polymorphic site. The primer with therestriction sites can be designed so that the restriction enzymes: (i)are easy to produce, or inexpensive to obtain commercially, (ii) cleaveefficiently in the same buffer, i.e. all potential cleavable ampliconsare fully cleaved in one step, (iii) cleave multiple differentamplicons, so as to facilitate multiplex analysis (that is, the analysisof two or more samples simultaneously).

[0022] An enhancement of the basic method is to select a combination ofrestriction enzymes that will cleave the amplified product so as toproduce staggered ends with a 5′ extension, such that the polymorphicsite is contained in the extension. Elimination of natural nucleotidesfrom the reaction (for example using Shrimp Alkaline Phosphatase orother alkaline phosphatase) and addition of at least one modifiednucleotide corresponding to one of the two nucleotides present at thepolymorphic site (for example 5′-bromodeoxyuridine if T is one of thetwo polymorphic mucleotides) will result in fill-in of the recessed 3′end to produce fragments differing in mass by more than the natural massdifference of the two polymorphic nucleotides. One or more modifiednucleotides can be selected to maximize the differential mass of the twoallelic fill-in products. This enhancement of the basic method has theadvantage of reducing the mass spectrometric resolution required toreliably determine the presence of two alleles vs. one allele, therebyimproving the performance of base-calling software and the ease withwhich a genotyping system can be automated.

[0023] Another modification of the basic system is to use a thirdrestriction enzyme that cleaves only one of the two alleles, such thatthe presence of the site yields shorter fragments than are observed inits absence. Such a modification is not universally applicable becausenot all polymorphisms alter restriction sites, however this limitationcan be partially addressed by including part of the restriction enzymerecognition site in the primer. For example, an interrupted pallindromerecognition site like Mwo I (GCNNNNN/NNGC) can be positioned such thatthe first GC is in the primer while the second GC includes thepolymorphic nucleotide. Only the allele corresponding to GC at thesecond site will be cleaved. Use of such restriction endonucleasessimplifies the sequence requirements at and about the polymorphic site(in this example all that is required is that one allele at thepolymorphic site include the dinucleotide GC), thereby increasing thenumber of polymorphic sites that can be analyzed in this way.

[0024] In additional aspects, the invention provides methods that areapplicable to both genotyping and haplotyping. The methods use biasedamplification of nucleic acid sequences that include variance sites, andutilize primers that are designed so that a hairpin loop will form,generally in the complementary strand formed in an amplificationreaction. The primer is designed to have a mismatch in its 5′ end to aparticular nucleotide at a particular site, generally a polymorphic sitein a gene. If the particular nucleotide is present at the site, thenamplification will be inhibited because the complementary strand formedin the amplification reaction will form a sufficiently stable hairpinloop to effectively compete with binding of the primer, and so inhibitfurther amplification. In contrast, a variant sequence with a differentnucleotide at that site will not form a sufficiently stable hairpin toeffectively compete with primer binding.

[0025] Thus, in one aspect, the invention provides a method for biasingthe amplification of one allele (e.g., one form of a SNP at a particularsite). As explained above, the biasing depends on the identity of aspecific nucleotide at a polymorphic site in a target nucleic acidsample. The method involves contacting a segment of DNA with two primersencompassing the polymorphic site under amplification conditions. Oneprimer contains a region at its 5′ end that is not complementary to thetarget nucleic acid but which, when incorporated into the amplificationproduct, will cause the 3′ end of the strand complementary to thisprimer in the amplification product to form a sufficiently stablehairpin loop by hybridizing with the sequence including the polymorphicsite to inhibit further amplification only if the specific nucleotide ispresent at the polymorphic site. The method also involves determiningwhether the segment is amplified. Amplification (or preferentialamplification) of the segment is indicative that the polymorphic sitecontains an alternative to the specific nucleotide.

[0026] In particular embodiments, the nucleic acid sample can be singlestranded DNA or double stranded DNA, and can be genomic or cDNA. RNA canalso be utilized, preferably by forming cDNA.

[0027] In certain embodiments, the amplification of the segment isdetected by detection of the presence of defined size fragmentsfollowing restriction enzyme digestion of any amplification products.The polymorphic site can be a restriction fragment length polymporphism(RFLP), and a digestion can be performed with a restriction enzymecorresponding to the RFLP, where the defined size fragments differ insize depending on the nucleotide present at the polymorphic site.

[0028] The method is not restricted to a single site, so in preferredembodiments, the method involves carrying out the contacting anddetermining for each of a plurality of different polymorphic sites. Forexample, at least 2, 3, 4, 5, 6, 8, 10, 15, 20, 30, 40, 50, or 100 sitescan be analyzed in a coordinated set of determinations (e.g., ingenotyping an individual for a plurality of different sites, which maybe in one or a plurality of different genes). In certain embodiments,the plurality of different polymorphic sites provides a haplotype for agene, can independently or also include at least one polymorphic site ina plurality of different genes, and/or provide haplotypes for aplurality of different genes.

[0029] Such biased amplification can be used to determine the nucleotidepresent at a particular polymorphic site. Thus, in a related aspect, theinvention provides a method for determining whether a particularnucleotide is present at a polymorphic site in a target nucleic acidsequence, by contacting a segment of DNA containing the polymorphic sitewith a primer under amplification conditions, such that extensionproducts and/or amplification products will be formed. The primer has asequence at its 5′ end that is the same as a sequence including thepolymorphic site for a particular nucleotide present at that site. Theopposite strand extension product or amplification product will form asufficiently stable hairpin loop by hybridization between a sequenceincluding the polymorphic site and a sequence derived from the 5′ end ofthe primer for a specific nucleotide at the polymorphic site site toinhibit amplification. Amplification is not inhibited for an alternativenucleotide at said site. The method also includes determining whetherthe segment is amplified. Amplification of the segment indicates thatthe polymorphic site contains an alternative nucleotide instead of thespecific nucleotide. In general, a second primer, consituting a primerpair, is also used under amplification conditions such that extensionproducts or amplification products or both will be formed. Particularembodiments include those as described for the aspect above.

[0030] Haplotyping Methods

[0031] This invention concerns methods for determining the sequence ofindividual chromosomes, starting with diploid DNA that contains twochromosomes, and methods for using that information to make genetictests useful for disease risk assessment, for diagnosing a disease orcondition, for assessing disease prognosis or to select optimal therapyfor a disease or condition. The sequence of a chromosome segment isreferred to as a haplotype. Since homologous chromosome segments (e.g.the sequence of two alleles of the ApoE gene) are very similar insequence (>99%) the distinguishing elements of haplotypes occur atpolymorphic sites. A haplotype can be thought of as the nucleotidesequence of a DNA segment at some or all of the sites that vary in apopulation. Thus a haplotype may consist in specifying the sequence at10 polymorphic sites in a 5,000 nucleotide DNA segment.

[0032] The pattern of genetic variation in most species, including man,is not random; as a result of human evolutionary history some sets ofpolymorphisms occur together on chromosomes, so that knowing thesequence of one polymorphic site may allow one to predict with someprobability the sequence of certain other sites on the same chromosome.Once the relationships between a set of polymorphic sites have beenworked out, a subset of all the polymorphic sites may be used in thedevelopment of a haplotyping test. In preferred embodiments of thehaplotyping methods of this invention, a subset of all the polymorphicsites at a locus is used to develop a haplotyping test. Thepolymorphisms that comprise a haplotype may be of any type.

[0033] Most polymorphisms (about 90% of all DNA polymorphisms) involvethe substitution of one nucleotide for another, and are referred to assingle nucleotide polymorphisms (SNPs). The other main type ofpolymorphism involves change in the length of a DNA segment as a resultof an insertion or deletion of anywhere from one nucleotide to thousandsof nucleotides. Insertion/deletion polymorphisms (also referred to asindels) account for most non-SNP polymorphisms. Common kinds of indelsinclude variation in the length of homopolymeric sequences (e.g. AAAAAAvs. AAAAA), variation in the number of short tandem repeat sequencessuch as CA (e.g. 13 repeats of CA vs. 15 repeats), and variation in thenumber of more complex repeated sequences (sometimes referred to as VNTRpolymorphisms, for variable number of tandem repeats), as well as anyother type of inter-individual variation in the length of a given DNAsegment. The repeat units may also vary in sequence.

[0034] Haplotypes are often not directly inferrable from genotypes(except in the special case of families, where haplotypes can often beinferred by analysis of pedigrees), therefore specialized methods arerequired for determining haplotypes from samples derived from unrelatedsubjects. Currently available haplotyping methods are cumbersome andexpensive and limited either by the type of samples that can be analyzed(e.g. sperm cells) or by the limitations of PCR or other DNAamplification methods. The limits of DNA amplification methods such asPCR include incomplete allele-specificity of priming when using a 3′terminal primer mismatch to achieve allele discrimination (such as inthe ARMS method); that is, there may be some amplification of thenon-selected allele. PCR is also limited in the length of DNA segmentthat can be amplified.

[0035] The present application provides methods for determining thehaplotypes present in a DNA sample or cDNA sample preferably drawn fromone subject, however these methods may also be used to determine thepopulation of haplotypes present in a complex mixture, such as may beproduced by mixing DNA samples from multiple subjects. The methodsdescribed herein are applicable to genetic analysis of any diploidorganism, or any polyploid organism in which there are only two uniquealleles. Application of the methods of this invention will provide forimproved genetic analysis, enabling advances in medicine, agricultureand animal breeding. For example, by improving the accuracy of genetictests for diagnosing predisposition to disease, or for predictingresponse to medical therapy, it will be possible to make safer and moreefficient use of appropriate preventive or therapeutic measures inpatients. The methods of this invention also provide for improvedgenetic analysis in a variety of basic research problems, including theidentification of alleles of human genes that are associated withdisease risk or disease prognosis.

[0036] Certain methods for determining haplotypes present in a DNAsample from a diploid organism include the following steps: (i)genotyping at least a portion of (meaning a sequence portion) the sampleto identify sites of heterozygosity; (ii) enriching for an allele by amethod not requiring amplification to a ratio of at least 1.5:1 based ona starting ratio of 1:1, where the information from (i) is used toselect a preferred or optimal heterozygous site or sites for alleleenrichment; (iii) genotyping the enriched material to determine thenucleotides present at said heterozygous site or sites; and (iv)determining the haplotype of the enriched allele by inspecting thegenotypes from (iii). This method may further include determining thehaplotype of the non-enriched allele by comparing the genotypedetermined in step (i) of with the haplotype determined in step (iv).Such a haplotyping method as described above may include additionalsteps including (a) performing an allele enrichment procedure for thesecond allele on the same starting material and (b) genotyping theenriched material for the second allele to determine the nucleotidespresent at said heterozygous site or sites; and (c) determining thehaplotype of the enriched second allele by inspecting the genotypes from(b).

[0037] Additional methods for determining the haplotypes present in DNAfrom a diploid organism, include the following steps: (i) genotyping atleast a portion of the DNA in a sample from said organism to identifysites of heterozygosity; (ii) performing an allele-selectiveamplification procedure on the sample such that the allele ratio ischanged from a starting ratio of 1:1 to at least 1.5:1, wherein theinformation from (i) is used to select an optimal polymorphic site orsites for designing primers to achieve said allele-selectiveamplification; (iii) genotyping the selectively amplified material; and(iv) determining the haplotype of the selectively amplified allele byinspecting the genotypes. Methods may include further determination ofthe haplotype of the selectively non-amplified allele by comparing thegenotype determined in (i) with the haplotype determined in (iv). Inaddition, methods may include determining the haplotype of theselectively non-amplified allele by (a) performing an allele-selectiveamplification procedure for the second allele using the same startingmaterial; (b) genotyping the selectively amplified second allelematerial; and (c) determining the haplotype of the selectively amplifiedsecond allele by inspecting the genotypes.

[0038] Also, methods for determining the haplotypes present in DNA froma diploid organism, include (i) genotyping at least a portion of a DNAsample from said organism to identify sites of heterozygosity thataffect restriction enzyme cleavage sites; (ii) restriction endonucleasedigesting the DNA, using natural or synthetic endonucleases, such thatone allele is restricted at a specific site and the other is not; (iii)performing an amplification procedure on the sample, using theinformation from step (i) to select optimal sites for designing primersto achieve allele-selective amplification; (iii) genotyping theselectively amplified material; and (iv) determining the haplotype ofthe selectively amplified allele by inspecting the genotypes. Thesehaplotyping methods further include determining the haplotype of theselectively non-amplified allele by comparing the genotype determined instep (i) with the haplotype determined in step (iv). In addition,methods may include (a) isolating the second allele utilizing sizedifference; (b) genotyping the size selected material corresponding tothe second allele; and (c) determining the haplotype of thesize-selected second allele by inspecting the genotypes.

[0039] Still further methods for determining the haplotypes present inDNA from a diploid organism include the steps of (i) genotyping at leasta portion of the DNA from the sample to identify sites of heterozygositythat affect restriction enzyme cleavage sites; (ii) restrictionendonuclease digesting the DNA, using natural or syntheticendonucleases, such that only one allele is restricted at a specificpolymorphic site, thereby creating partially overlapping allele I andallele 2 fragments of different length, wherein information from (i) isutilized to select a restriction site that produces a useful differencein allele length; (iii) separating the restricted molecules according totheir size by electrophoresis or centrifugation, such that the twoallelic restriction fragments are resolved; isolate DNA moleculescorresponding to the size of allele 1 and, optionally, allele 2; (iv)genotyping the size selected material corresponding to allele 1 andoptionally allele 2; and (v) determining the haplotype of thesize-selected allele 1 by inspecting the genotypes. These methods mayinclude determination of the haplotype of allele 2 by comparing thegenotypes determined in (i) with the haplotype determined in (v).

[0040] Additional embodiments of methods for haplotyping double strandedDNA fragments include (i) genotyping at least a portion of a DNA sampleto identify sites of heterozygosity in the DNA fragment of interest;(ii) immobilizing double stranded DNA fragments on a solid support;(iii) adding two or more components that bind at polymorphic sites inthe immobilized DNA fragment of interest to produce detectable structureunder conditions that promote preferential binding to only one strand ofthe target immobilized fragment; and (iv) determining the location oftarget fragments. These methods may further include two or morecomponents which are two or more oligonucleotides complementary topolymorphic sites in the aforementioned immobilized DNA fragment ofinterest. The components are added under conditions that promote D loopformation in the case of oligonucleotides perfectly matched to onestrand of the target immobilized fragment, but not in the case ofoligonucleotides containing one or more mismatched nucleotides. Theformation of D loops may be enhanced by the addition of RecA protein oralternatively by the alteration of salt concentration within themixture. The two or more components may further include two or morepeptide nucleic acids (PNA) or two or more zinc finger proteins. Inmethods including PNA, the peptide nucleoc acids are complementary topolymorphic sites in the immobilized DNA fragment of interest, and areadded under conditions that promote D loop formation in the case of PNAsperfectly matched to one strand of the target immobilized fragment, butnot in the case of PNAs containing one or more mismatched nucleotides.In methods including zinc finger proteins, the proteins that can bind toone of two alleles at a polymorphic nucleotide may be used and are addedas described for the oligonucleotide components. The two or more zincfinger proteins can be detectably labeled. The immobilized target DNAfragments may be first subjected to a size selection procedure and orimmobilized to a prepared glass surface. These methods may then be usedto determine the location of the target fragments by optical mapping. Inthis more specifc method for detection, two or more oligonucleotides aredetectably labeled.

[0041] Further embodiments of a method for determining the haplotypes ofDNA fragments present in a DNA sample from a diploid organism including:a) selectively amplifying one haplotype from the mixture by the allelespecific clamp PCR procedure; and b) determining the genotype of two ormore polymorphic sites in the amplified DNA fragment. The selectiveamplification may be preceded by determining the genotype of the DNAsample at two or more polymorphic sites in order to devise an optimalgenotyping and that the DNA sample is a mixture of several DNA samples.

[0042] Additional haplotyping methods and embodiments of this inventionare described in the Detailed Description below.

[0043] APOE Genotyping and Haplotyping

[0044] Several United States patents relate to methods for determiningApoE haplotype and using that information to predict whether a patientis likely to develop late onset type Alzheimer's Disease (U.S. Pat. Nos.5,508,167, 5,716,828), whether a patient with cognitive impairment islikely to respond to a cholinomimetic drug (U.S. Pat. No. 5,935,781), orwhether a patient with a non-Alzheimer's neurological disease is likelyto respond to therapy (U.S. Pat. No. 5,508,167).

[0045] The ApoE test practiced in all the cited patents (and virtuallyall the other publications), is based on a classification of Apo E intothree alleles, termed epsilon 2, epsilon 3 and epsilon 4 (andabbreviated e2, e3 and e4). These three alleles are distinguishable onthe basis of two polymorphic sites in the ApoE gene. The status of bothsites must be tested to determine the alleles present in a subject. Thetwo polymorphic sites are at nucleotides 448 and 586 of the ApoE cDNA(numbering from GenBank accession K00396), corresponding to amino acids112 and 158 of the processed ApoE protein. The nucleotide polymorphismat both sites is T vs. C, and at both sites it is associated with acysteine vs. arginine amino acid polymorphism, wherein the codon with Tencodes cysteine and the codon with C encodes arginine. The presence ofT at both polymorphic sites (cysteine at both residues 112 and 158) isdesignated e2; T at position 448 and C at position 586 (cysteine at 112,arginine at 158) is designated e3, and C at both variable sites(arginine at both 112 and 158) is designated e4. These three alleles (aswell as rarer alleles) occur in virtually all human populations, withthe frequency of the alleles varying from population to population. Thee3 allele is commonest all populations, while the frequency of e2 and e4varies. Numerous studies have demonstrated association between ApoEalleles and risk of various diseases or biochemical abnormalities. Forexample the e4 allele is associated with risk of late onset Alzheimer'sdisease and elevated serum cholesterol.

[0046] It has been apparent for several years that the e2, e3, e4classification does not provide sufficient sensitivity or specificity tobe used alone as a diagnostic test for assessing risk of or making adiagnosis of either dyslipidemia, heart disease or Alzheimer's disease(AD) in asymptomatic individuals. Even the use of ApoE testing as a toolin the differential diagnosis of dementia (e.g. to increase thecertainty of a clinical diagnosis of Alzheimer's type dementia in apatient with early signs of dementia in whom the diagnosis ofAlzheimer's is being considered) is debated. Thus, while many importantassociations between ApoE genotype and medically important conditions ortreatment responses have been described and repeatedly confirmed, it isevident that the strength of these associations is not as great as wouldbe desirable for a routine predictive, diagnostic or prognostic test,and in fact may not be sufficient to justify ApoE genetic testing forany non-research purpose.

[0047] The lack of sensitivity and specificity that limits the use ofcurrent ApoE genotype tests is likely attributable to two factors.First, the current ApoE test may not measure all the functionalvariation in the ApoE gene. For example, it does not take full accountof any genetically determined variation in transcription regulation;variation in RNA processing—including splicing, polyadenylation andexport to the cytoplasm; variation in mRNA translational efficiency andhalf life, as well as variation in protein activity including receptorbinding, interaction with regulatory factors, half life, etc. This istrue particularly insofar as such variation may be determined bypolymorphisms other than those that account for the e2, e3, e4classification. Second, there may be variables besides ApoE allelestatus that affect the various conditions for which ApoE genotyping hasbeen tested. Other relevant variables for neurodegenerative diseasessuch as AD include variation in the genes that encode protein componentsof AD lesions, such as tau protein or amyloid precursor protein; theproteases that produce pathological forms of these proteins, such asbeta and gamma secretase and the memapsins; AD disease genes such aspresenilin 1 and 2; genes involved in brain inflammatory responsepathways, and other groups of genes implicated in neurodegeneration bybiochemical, genetic or epidemiological evidence. Variables that mayinteract with ApoE genotype or haplotype to affect cholesterol andtriglyceride levels and heart disease risk include the genes encodingApoE receptors (low density lipoprotein receptor, and the low densitylipoprotein receptor related protein), and genes encoding otherapolipoproteins and their receptors, as well as the genes of cholesterolbiosynthesis, including hydroxymethylglutaryl CoA reductase, mevalonatesynthetase, mevalonate kinase, phosphomevalonate kinase, squalenesynthase and other enzymes.

[0048] The present invention addresses the first limitation of currentApoE testing (failure of current ApoE tests to record all the alleles ofApoE that have distinct biochemical or clinical effects) by providingfor a much more sensitive test of ApoE variation. Specifically, wedescribe 20 DNA polymorphisms in and around the ApoE gene (including thetwo polymorphisms that are traditionally studied). We also describe thecommonly occuring haplotypes at the ApoE locus—that is, the sets ofpolymorphic nucleotides that occur together on individualchromosomes—and novel methods for determining haplotypes in clinicalsamples. Also described are data analysis strategies for extracting themaximum information from the ApoE haplotypes, so as to enhance theirutility in clinical settings.

[0049] The ApoE haplotypes include any haplotype that can be assembledfrom the sequence polymorphisms described herein in Table 2, or anysubset of those polymorphisms. Thus, the invention expressly includes ahaplotype including either of the alternative nucleotides at any 2, 3,4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 of theidentified polymorphic sites. The haplotypes expressly include eachcombination of sites with each selection of alternative nucleotide ateach site included in the haplotype. The haplotypes may also include oneor more additional polymorphic sites which are known in the art or whichmay be identified in the future. Among the haplotypes described beloware a set of haplotypes that parallel the current e2, e3, e4classification but do not involve either of the nucleotides that specifythe e2, e3, e4 system.

[0050] The present invention also addresses the second potentiallimitation of current ApoE testing—failure to test for the interactionof ApoE genotype or haplotype with other genetic determinants of nervoussystem disease or cardiovascular disease risk, prognosis or response totherapy. The phenotypes for which ApoE genotyping or haplotyping havebeen tested are determined by multiple genes, and therefore require thesimultaneous analysis of variation in two or more genetic loci. Thehaplotyping methods of this application facilitate such analysis byproviding a basis for (i) identifying substantially all haplotypes thatexist at appreciable frequency in a population or populations, (ii)clustering said haplotypes in groups of two or more haplotypes tofacilitate statistical analysis, thereby increasing the power ofassociation studies.

[0051] As used herein, “population” refers to a group of individualsthat share geographic (including, but not limited to, national), ethnicor racial heritage. A population may also comprise individuals with aparticular disease or condition (“disease population”). The concept of apopulation is useful because the occurance and/or frequency of DNApolymorphisms and haplotypes, as well as their medical implications,often differs between populations. Therefore knowing the population towhich a subject belongs may be useful in interpreting the healthconsequences of having specific haplotypes. A population preferablyencompasses at least ten thousand, one hundred thousand, one million ormore individuals, with the larger numbers being more preferable. Inembodiments of this invention, the allele (haplotype) frequency,heterozygote frequency, or homozygote frequency of a two or more allelesof a gene or genes is known in a population. In preferred embodiments ofthis invention, the frequency of one or more variances that may predictresponse to a treatment is determined in one or more populations using adiagnostic test.

[0052] In one aspect, the invention provides a method for determining agenotype for ApoE in an individual, comprising determining thenucleotide present at least one polymorphic site different fromnucleotides 21250, and 21388 in an ApoE allele from an individual. Inpreferred embodiments, the polymorphic site is selected from the groupconsisting of nucleotides 16541, 16747, 16965, 17030, 17098, 17387,17785, 17874, 17937, 18145, 18476, 19311, 20234, 21349, 23524, 23707,23759, 23805, and 37237. In certain embodiments, the method alsocomprises determining the nucleotide present at at least one ofnucleotides 21250 and 21388. The determining is performed by a methodcomprising variance specific nucleic acid hybridization. The variancespecific nucleic acid hybridization can be performed on an array,preferably an array composed of immobilized oligonucleotides or in situsynthesized oligonucleotides and the hybridizing species are DNAfragments. In certain embodiments, the DNA fragments are PCRamplification products. In some embodiments, the array is composed ofimmobilized DNA fragments and the hybridization species areoligonucleotides.

[0053] Determining the nucleotide present at a polymorphic site can beperformed using a primer extension method distinguishing betweennucleotides present at said at least one site, for example, as methodusing dideoxynucleotides to effect nucleic acid chain termination. Thatdetermining can alternatively be performed using a method involvingchemical cleavage of a nucleic acid molecule including a saidpolymorphic site. The nucleic acid fragment masses following saidchemical cleavage is preferably determined using mass spectrometry.

[0054] In other embodiments, determining the nucleotide present at apolymorphic site is performed using an cleavase based signalamplification method.

[0055] The nucleotide determination can also be performed using abead-based method, preferably where the beads have a boundoligonucleotide species which is perfectly mateched or one basemismatched to the target.

[0056] Again alternatively, the determining can be performed using aFRET-based method.

[0057] In another aspect, the invention provides a method fordetermining a haplotype for ApoE in an individual, by genotyping atleast two polymorphic sites in ApoE sequence on at least one allele ofsaid individual, preferably where at least one of said polymorphic sitesis different from nucleotides 21250 and 21388. As in the precedingaspect, in preferred embodiments, the polymorphic sites include at leastone site selected from the group consisting of nucleotides 16541, 16747,16965, 17030, 17098, 17387, 17785, 17874, 17937, 18145, 18476, 19311,20234, 23524, 23707, 21349, 23759, 23805, and 37237.

[0058] In preferred embodiments, the genotyping is performed on twoalleles of said individual.

[0059] In preferred embodiments, the genotyping is performed for atleast 3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19, or 20 of thepolymorphic sites.

[0060] Embodiments of the preceding two aspects can also be applied inconnection with additional aspects, particularly aspects concerning ApoEdescribed herein.

[0061] The invention also provides a method for classifying ApoEhaplotypes for a plurality of individuals, by determining at least oneApoE haplotype for each of the plurality of individuals, determining thesequence similarity of the haplotypes (using methods for determiningsequence similarity as known to those of ordinary skill in the art, andassigning the haplotypes to groups of haplotypes based on said sequencesimilarities. This method thus constructs groups of related ApoEhaplotypes based on sequence relationship.

[0062] Further, the invention provides a method for providing anindication of the risk for an individual to develop a disease orcondition, by determining a haplotype of ApoE in the individual, wherethe haplotype provides a measure of the risk.

[0063] In preferrred embodiments of this aspect and other aspectsrelating to ApoE and a disease, the disease is selected from the groupconsisting of coronary heart disease, a non-Alzheimer's Diseaseneurological disease, Alzheimer's disease, stroke, brain trauma,amyotrophic lateral sclerosis, temporal lobe epilepsy, Wilson's disease,continuous ambulatory peritoneal dialysis, glycogen storage disease typeIa, and age-related macular degeneration.

[0064] The method (and other methods described herein relating to ApoEand disease) can also include determining a genotype or haplotype of atleast one additional gene, where the haplotype of ApoE together with thegenotype or haplotype of the additional gene(s) provides a measure ofthe risk.

[0065] The invention provides a method for diagnosing the presence of adisease in an individual, by determining whether the individual has anApoE haplotype associated with the disease.

[0066] In preferred embodiments, the method also includes determining agenotype or haplotype of at least one additional gene and determiningwhether the individual has a combination of the haplotype of ApoE andthe genotype or haplotype of the at least one additional gene associatedwith the disease.

[0067] Likewise, the invention provides a method for predicting theclinical course for a patient suffering from a disease, by determiningan ApoE haplotype for the individual, where at least one ApoE haplotypeis associated with the clinical course of the disease.

[0068] In preferred embodiments, the clinical course comprises atreatment prognosis for a particular method of treatment, the clinicalcourse comprises at least one clinical disease parameter selected fromthe group consisting of rate of disease development, time interval todeath, time interval to dementia, and time interval to inability to liveindependently.

[0069] The invention also provides a method for selecting a subject forprophylactic treatment of a disease, by identifying a subject having anApoE haplotype associated with an elevated risk of developing thedisease, wherein said prophylactic treatment can provide a clinicalbenefit to a the subject.

[0070] The invention also provides a method for selecting a patient fortreatment of a disease, involving determining whether the patient has anApoE haplotype associated with favorable clinical prognosis with aparticular treatment.

[0071] Similarlyh, the invention provides a method for selection of atreatment for a patient suffering from a disease. The method involvesdetermining an ApoE haplotype for the patient; and identifying atreatment associated with favorable clinical prognosis for a patienthaving that ApoE haplotype.

[0072] As ApoE haplotype is associated with treatment selection andprognosis, the invention also provides a method of treating a patientsuffering from a disease, by determining an ApoE haplotype for thepatient, identifying a treatment associated with favorable clinicalprognosis for a patient having that ApoE haplotype, and administeringthat treatment to the patient.

[0073] ApoE haplotype and genotype information also can be utilized inidentifying individuals, or the individual source of a biologicalsample. Thus, the invention provides a method for determining whether abiological sample was from an individual, by determining the nucleotidespresent at a plurality of ApoE polymorphic sites in the individual andin DNA obtained from the sample, and determining whether the nucleotidespresent at the polymorphic sites are the same or different. The presenceof the same nucleotides at respective sites is indicative that saidsample is from said individual, and the presence of differentnucleotides is indicative that said sample is not from said individual.The ApoE genotype or haplotype information can also be usefully combinedwith similar information for polymorphic sites in other genes or othernucleic acid sequences from the individual and the sample. In preferredembodiments, the plurality of ApoE polymorphic sites comprises an ApoEhaplotype.

[0074] The invention also provides a method for determining whether anApoE haplotype is associated with a disease risk. This method involvesdetermining ApoE haplotypes for each individual in a set of individuals,dividing the set of individuals into at least two groups based on ApoEhaplotypes, and determining whether individuals having a particular ApoEhaplotype or individuals in a group differ from individuals having adifferent ApoE haplotype or in a different group in incidence,prevalance, severity, or progression or a combination thereof, ofdisease. This aspect can also be combined with embodiments of otheraspects described herein involving ApoE and disease, disease treatmentand other such aspects.

[0075] The invention also provides a method for determining whether acombination of an ApoE haplotype and a genotype or haplotype of at leastone additional gene is associated with a disease risk. The methodincludes determining ApoE haplotypes and genotypes or haplotypes for theat least one additional gene for each individual in a set ofindividuals, dividing the set of individuals into at least two groupsbased on the combinations of ApoE haplotypes and genotype or haplotypeof said at least one additional gene, and determining whetherindividuals having a particular combination or individuals in a groupdiffer from individuals having a different combination or in a differentgroup, in incidence, prevalance, severity, or progression or acombination thereof, of said disease.

[0076] The invention further provides a method for determining whetheran ApoE haplotype is associated with a pharmacologic parameter, bymeasuring the parameter for cells of at least one individual with saidApoE haplotype, measuring the parameter for cells of at least oneindividual with a different ApoE haplotype, and comparing the measures.Preferably a larger number, e.g., at least 3, 5, 10, 20, 30, 50, 100, oreven more, of individuals are utilized, thereby providing additionalcorrelation information. Correlation or other statistical measure ofrelatedness between haplotype and pharmacologic parameter can be used byone or ordinary skill in the art.

[0077] As used herein “polymorphism” refers to DNA sequence variation inthe cellular genomes of plants or animals, preferably mammals, and morepreferably humans. These sequence variations include mutations, singlenucleotide changes and insertions and deletions. “Single nucleotidepolymorphism” (SNP) refers to those differences among samples of DNA inwhich a single nucleotide base pair has been substituted by another.

[0078] As used herein “variance” or “variants” is synonymous withpolymorphism, and refers to DNA sequence variations. The terms “variantform of a gene”, “form of a gene”, or “allele” refer to one specificsequence of a gene that has at least two sequences, the specific formsdiffering from other forms of the same gene at at least one, andfrequently more than one, variant sites within the gene. The sequencesat these variant sites that differ between different alleles of the geneare variously termed “alleles”, “gene sequence variances”, “variances”or “variants”. The term “alternative form” refers to an allele that canbe distinguished from other alleles by having distinct variances atleast one, and frequently more than one, variant sites within the genesequence. Other terms known in the art to be equivalent include mutationand polymorphism, although mutation is often used to refer to an alleleassociated with a deleterious phenotype.

[0079] As used herein “phenotype” refers to any observable or otherwisemeasurable physiological, morphological, biological, biochemical orclinical characteristic of an organism. The point of genetic studies isto detect consistent relationships between phenotypes and DNA sequencevariation (genotypes). DNA sequence variation will seldom completelyaccount for phenotypic variation, particularly with medical phenotypesof interest (e.g. commonly occuring diseases). Environmental factors arealso frequently important.

[0080] As used herein, “genotype” refers to the genetic constitution ofan organism. More specifically, “genotyping” as used herein refers tothe analysis of DNA in a sample obtained from a subject to determine theDNA sequence in a specific region of the genome—e.g. at a gene thatinfluences a disease or drug response. The term “genotyping” may referto, the determination of DNA sequence at one or more polymorphic sites.

[0081] As used herein, “haplotype” refers to the partial or completesequence of a segment of DNA from a single chromosome. The DNA segmentmay include part of a gene, an entire gene, several genes, or a regiondevoid of genes (but which perhaps contains DNA sequence that regulatesthe function of nearby genes). The term “haplotype”, then, refers to acis arrangement of two or more polymorphic nucleotides on a particularchromosome, e.g., in a particular gene. The haplotype preservesinformation about the phase of the polymorphic nucleotides—that is,which set of variances were inherited from one parent (and are thereforeon one chromosome), and which from the other. A genotyping test does notprovide information about phase. For example, a subject heterozygous atnucleotide 25 of a gene (both A and C are present) and also atnucleotide 100 of the same gene (both G and T are present) could havehaplotypes 25A-100G and 25C-100T, or alternatively 25A-100T and25C-100G. Only a haplotyping test can discriminate these two casesdefinitively. Haplotypes are generally inherited as units, except in theevent of a recombination during meiosis that occurs within the DNAsegment spanned by the haplotype—a rare occurance for any given sequencein each generation. By “haplotyping”, or “determining the haplotype” asused herein is meant determining the sequence of two or more polymorphicsites on a single chromosome. Usually the sample to be haplotypedconsists initally of two admixed copies of the chromome segment to behaplotyped—i.e. DNA from a diploid subject.

[0082] As used herein “genetic testing” or “genetic screening” refers tothe genotyping or haplotyping analyses performed to determine thealleles present in an individual, a population, or a subset of apopulation.

[0083] “Disease risk” as used herein refers to the probability that, fora specific disease (e.g. coronary heart disease) an individual who isfree of evident disease at the time of testing will subsequently beaffected by the disease.

[0084] “Disease diagnosis” as used herein refers to ability of aclinician to appropriately determine and identify whether the expressedsymtomology, pathology or physiology of a patient is associated with adisease, disorder, or dysfunction.

[0085] “Disease prognosis” as used herein refers to the forecast of theprobable course and or outcome of a disease, disorder, or dysfunction.

[0086] “Therapeutic management” as used herein refers to the treatmentof disease, disorders, or or dysfunctions by various medical methods. By“disease management protocol” or “treatment protocol” is meant a meansfor devising a therapeutic plan for a patient using laboratory, clinicaland genetic data, including the patient's diagnosis and genotype. Theprotocol clarifies therapeutic options and provides information aboutprobable prognoses with different treatments. The treatment protocol maytheprovide an estimate of the likelihood that a patient will respondpositively or negatively to a therapeutic intervention. The treatmentprotocol may also provide guidance regarding optimal drug dose andadministration, and likely timing of recovery or rehabilitation. A“disease mangement protocol” or “treatment protocol” may also beformulated for asymptomatic and healthy subjects in order to forecastfuture disease risks based on laboratory, clinical and geneticvariables. In this setting the protocol specifies optimal preventive orprophylactic interventions, including use of compounds, changes in dietor behavior, or other measures. The treatment protocol may include theuse of a computer program.

[0087] The term “associated with” in connection with the relationshipbetween a genetic characteristic, e.g., a gene, allele, haplotype, orpolymorphism, and a disease or condition means that there is astatistically significant level of relatedness between them based on anygenerally accepted statistical measure of relatedness. Those skilled inthe art are familiar with selecting an appropriate statistical measurefor a particular experimental situation or data set. The geneticcharacteristic, e.g., the gene or haplotype, may, for example, affectthe incidence, prevalence, development, severity, progression, or courseof the disease. For example, ApoE or a particular allele(s) or haplotypeof the gene is related to a disease if the ApoE gene is involved in thedisease or condition as indicated, or if a particlar sequence variance,haplotype, or allele is so involved.

[0088] As used herein, a “gene” is a sequence of DNA present in a cellthat directs the expression of a “biologically active” molecule or “geneproduct”, most commonly by transcription to produce RNA and translationto produce protein. Such a gene may also be manipulated by manydifferent molecular biology techniques, and thus, for example, can beisolated or purified or otherwise separated from its naturalenvironment. The “gene product” is most commonly a RNA molecule orprotein or a RNA or protein that is subsequently modified by reactingwith, or combining with, other constituents of the cell. Suchmodifications may include, without limitation, modification of proteinsto form glycoproteins, lipoproteins, and phosphoproteins, or othermodifications known in the art. RNA may be modified without limitationby polyadenylation, splicing, capping or export from the nucleus or bycovalent or noncovalent interactions with proteins. The term “geneproduct” refers to any product directly resulting from transcription ofa gene. In particular this includes partial, precursor, and maturetranscription products (i.e., pre-mRNA and mRNA), and translationproducts with or without further processing including, withoutlimitation, lipidation, phosphorylation, glycosylation, or combinationsof such processing

[0089] As used herein the term “hybridization”, when used with respectto DNA fragments or polynucleotides encompasses methods including bothnatural polynucleotides, non-natural polynucleotides or a combination ofboth. Natural polynucleotides are those that are polymers of the fournatural deoxynucleotides (deoxyadenosine triphosphate [dA],deoxycytosine triphosphate [dC], deoxyguanine triphosphate [dG] ordeoxythymidine triphosphate [dT], usually designated simply thymidinetriphosphate [T]) or polymers of the four natural ribonucleotides(adenosine triphosphate [A], cytosine triphosphate [C], guaninetriphosphate [G] or uridine triphosphate [U]). Non-naturalpolynucleotides are made up in part or entirely of nucleotides that arenot natural nucleotides; that is, they have one or more modifications.Also included among non-natural polynucleotides are molecules related tonucleic acids, such as peptide nucleic acid [PNA]). Non-naturalpolynucleotides may be polymers of non-natural nucleotides, polymers ofnatural and non-natural nucleotides (in which there is at least onenon-natural nucleotide), or otherwise modified polynucleotides.Non-natural polynucleotides may be useful because their hybridizationproperties differ from those of natural polynucleotides. As used hereinthe term “complementary”, when used in respect to DNA fragments, refersto the base pairing rules established by Watson and Crick: A pairs withT or U; G pairs with C. Complementary DNA fragments have sequences that,when aligned in antiparallel orientation, conform to the Watson-Crickbase pairing rules at all positions or at all positions except one. Asused herein, complementary DNA fragments may be natural polynucleotides,non-natural polynucleotides, or a mixture of natural and non-naturalpolynucleotides.

[0090] As used herein “amplify” when used with respect to DNA refers toa family of methods for increasing the number of copies of a startingDNA fragment. Amplification of DNA is often performed to simplifysubsequent determination of DNA sequence, including genotyping orhaplotyping. Amplification methods include the polymerase chain reaction(PCR), the ligase chain reaction (LCR) and methods using Q betareplicase, as well as transcription-based amplification systems such asthe isothermal amplification procedure known as self-sustained sequencereplication (3SR, developed by T. R. Gingeras and colleagues), stranddisplacement amplification (SDA, developed by G. T. Walker andcolleagues) and the rolling circle amplification method (developed by P.Lizardi and D. Ward).

[0091] By “comprising” is meant including, but not limited to, whateverfollows the word “comprising”. Thus, use of the term “comprising”indicates that the listed elements are required or mandatory, but thatother elements are optional and may or may not be present. By“consisting of” is meant including, and limited to, whatever follows thephrase “consisting of”. Thus, the phrase “consisting of” indicates thatthe listed elements are required or mandatory, and that no otherelements may be present. By “consisting essentially of” is meantincluding any elements listed after the phrase, and limited to otherelements that do not interfere with or contribute to the activity oraction specified in the disclosure for the listed elements. Thus, thephrase “consisting essentially of” indicates that the listed elementsare required or mandatory, but that other elements are optional and mayor may not be present depending upon whether or not they affect theactivity or action of the listed elements.

[0092] Other features and advantages of the invention will be apparentfrom the following description of the preferred embodiments thereof, andfrom the claims.

DETAILED DESCRIPTION OF THE INVENTION BRIEF DESCRIPTION OF THE FIGURESAND TABLES

[0093] Table 1 The table lists the masses of the normal nucleotides andBrdU and the mass differences between each of the possible pairs ofnucleotides.

[0094] Table 2 Twenty polymorphic sites in the ApoE gene. The ApoEgenomic sequence is taken from GenBank accession AB012576. The gene iscomposed of four exons and three introns. The transcription start site(beginning of first exon) is at nucleotide (nt) 18,371 of GenBankaccession AB012576, while the end of the transcribed region (end of the3′ untranslated region, less polyA tract) is at nt 21958. The twentypolymorphic sites are depicted as shaded nucleotides in the Table, andare as follows (nucleotide position and possible nucleotides): 16541(T/G); 16747 (T/G); 16965 (T/C); 17030 (G/C); 17098 (A/G); 17387 (T/C);17785 (G/A); 17874 (T/A); 17937 (C/T); 18145 (G/T); 18476 (G/C); 19311(A/G); 20334 (A/G); 21250 (C/T; 21349 (T/C); 21388 (T/C); 23524 (A/G);23707 (A/C); 23759 (C/T); 23805 (G/C); and 37237 (G/A). The boldsequence listing indicates the transcribed sequence of the ApoE gene;the grey shaded region indicates the ApoE gene enhancer element; theunderlined sequence depicts the coding region of the ApoE gene. Wherepolymorphisms result in a change of the amino acid sequence, the aminoacid alteration is indicated, for example at nucleotide position 20334the A/T polymorphism results in a alanine/threonine repsectively atamino acid position 18 of the ApoE gene product. As described in theDetailed Description below, the polymorphisms at positions GenBanknucleotide number 17874, 17937, 18145, 18476, 21250, and 21388 have beenpreviously described.

[0095] Table 3 This table provides experimentally derived ApoEhaplotypes. The haplotypes encompass nine polymorphic sites within theApoE gene (GenBank accession number AB012576). The Table has ninecolumns with haplotype data at nine specific sites within the ApoE gene.The column listed as “WWP #” refers to a Coriell number which refers tothe catalogued number of an established human cell line. The“VGNX_Symbol” row provides an internal identifier for the gene; the“VGNX database” row identifies the base pair number of the ApoE cDNA;and the “GenBank” row identifies the GenBank base pair number of thesequence for the ApoE gene. The abbreviations are as follows: A=adeninenucleotide, C=cytosine nucleotide, G=guanosine nucleotide, andT=thymidine nucleotide. The abbreviated nucleotides in brackets indicatethat either nucleotide may be present in the sample. Thus for example,under column GEN-CBX and WWP#1, the genotype identified at the GenBankposition 17874 is an “A”; whereas under Column GEN-CBX at the GenBankposition 18476 the genotype under the WWP#1 is either a “T” or a “G”.

[0096] Table 4 This table provides the sequence of ApoE haplotypescomprising up to 20 polymorphic sites. There are 42 ApoE haplotypeslisted in the Table. The top row of the table provides the location ofthe polyorphic nucleotides in the ApoE gene (see Table 2). The numbers(16541, 16747, and so forth) correspond to the numbering in GenBankaccession AB012576_(—)1, which provides the sequence of a cosmid clonethat contains the entire ApoE gene and flanking DNA. Each column showsthe sequence of the ApoE gene at the position indicated at the top ofthe column. Abbreviations are as follows: A=adenine nucleotide,C=cytosine nucleotide, G=guanosine nucleotide, and T=thymidinenucleotide. Each row provides the sequence of an individual phenotype.

[0097] Table 5 This table provides the sequence of haplotypes at the theApoE gene determined by 5 polymorphic sites. These haplotypes allowclassification of ApoE alleles into the e2, e3 and e4 groups withoutrecourse to the polymorphic sites conventionally used to determine e2,e3, e4 status. In this table the haplotypes are specified by SNPs atpositions 16747, 17030, 17785, 19311, and 23707, listed as columnheadings. The GENOTYPE column provides the classic ApoEgenotype/phenotype (e2, e3 and e4) corresponding to the haplotypeindicated in each row.

[0098]FIG. 1 Depiction of a primer designed to incorporate restrictionenzyme recognition sites for the specific restriction enzymes Fok I andFsp I. The primer (primer R sequence) has altered bases from the desiredamplified region of the target DNA. The polymorphic nucleotide isincluded in the target DNA region and is as indicated by the arrow.After PCR amplification, the incorporated altered base pairs of theprimer is thereby incorporates a FokI and FspI restriction sites in theamplicon. The presence of the FokI and FspI sites can subsequently bedigested in the presence of the FokI and FspI restriction enzymes underoptimal conditions for digestion by both enzymes. The resultantfragments, an 8 mer and a 12 mer, after enzyme digestion are asdepicted. In this figure, the polymorphism (A, in italic) is containedwith the 12 mer fragment.

[0099]FIG. 2 This figure depicts the utility of Fok I, a type IISrestriction enzyme, which cleaves DNA outside the recognition sequenceat a distance of 9 bases 3′ to the recognition site on one strand and 13bases away from the recognition site on the opposite strand, leaving afour base overhang (protruding 5′ end). As shown in this figure, bydesigning the primer so that the Fok I recognition site is locatedwithin 12 bases or less of the 3′ end of the primer one can assure thatthe Fok I cleavage will cleave outside the primer sequence. Furthershown is the utility of FspI, a restriction enzyme that after digestionleaves blunt ends. The FspI recognition site, TGCGCA, after digestionresults in fragments as shown.

[0100]FIG. 3 In this figure, the utility of the Fsp I/Fok I pair ofenzymes for the present invention is shown. The FspI recognition siteoverlaps that of Fok I, allowing the two sites to be partially combined.Thus, including the combined FspI/FokI sequence in the primer, reducesthe number of bases that are be introduced into the modified primer,making the primer design simpler and more likely to function in thesubsequent amplification reaction.

[0101]FIG. 4 In this figure, an alternative method of primer design inthe present invention involves the use of a primer with an internalloop. The primer is designed (primer R1) such that one of the basescorresponding to the native sequence is removed and replaced with aloop. In this case the G/C indicated by the arrow below the targetsequence is replaced with the recognition sequence for Fok I and Fsp I.Upon hybridization to the DNA template, the primer will form a loopstructure. This loop will be incorporated into the amplicon during theamplification process, thereby introducing the Fok I and Fsp Irestriction sites (indicated by the box). The resultant amplicon isincubated with Fok I and Fsp I under optimal digestion conditionsproducing an 8-mer and a 12-mer fragment. As in FIG. 1, the 12-mercontains the polymorphic base (A in italic) and can be analyzed by massspectrometry to identify the base at the polymorphic site.

[0102]FIG. 5 Alternative restriction enzyme recognition siteincorporation into amplified regions of target DNA is shown. As isdepicted in FIGS. 1-4 for the enzyme pair FspI/FokI; in this figure,PvuII/FokI restriction enzymatic sites can be incorporated in the samemanner as previously described for FIGS. 1-4. A primer is designed suchthat a BsgI/PvuII sites form a hair-pin loop when the primer ishybridized to the target DNA sequence. After amplification by PCR, theresultant amplicon will have the PvuII/FokI sites incorporated in theresultant amplicon (as indicated by the boxed sequence). After digestionunder conditions optimal for PvuII and BsgI, the resultant fragments, an14 mer and a 16 mer, are sufficient for mass spectrometric analysis andthe polymorphic site is contained in the 16mer (A, in italic).

[0103]FIG. 6 Shown in this figure is an alternative restriction enzymepair for the preparation of fragments containing the polymorphic sitefor mass spectrometric analysis. As shown in the figure, the primer hasa PvuII/FokI restriction enzyme recognition sites that form a hair-pinloop when hybridized to the target DNA sequence. After amplification byPCR, the resultant amplicon will have the PvuII/FokI sites incorporatedin the resultant amplicon (as indicated by the boxed sequence). Afterdigestion under conditions optimal for PvuII and FokI, the resultantfragments, an 16 mer and a 20 mer, are sufficient for mass spectrometricanalysis and the polymorphic site is contained in the 20mer (A, initalic).

[0104]FIG. 7 In this figure, a modification of the method depicted inFIG. 4 is shown. As in FIG. 4, a DNA segment containing a polymorphismis amplified using two primers. One primer is designed with an insertedDNA segment, not complementary to template DNA, that forms a hair-pinloop when hybridized to template DNA. Insertion of the non-complementaryDNA segment results in incorporation of overlapping FokI and FspIrestriction enzyme sites after PCR amplification (as shown in the boxedsequence). Following PCR amplification reaction, the reaction issubjected to a clean up procedure to remove unincorporated primers,nucleotides and buffer constituents. The PCR product is then digestedwith the FokI restriction enzyme which generates a 5′ overhang thatextends from the 3′ end of the primer to beyond the polymorphicnucleotide. The 3′ recessed end can then be filled in with exogenouslyadded nucleotides in which the normal nucleotide corresponding to one ofthe possible nucleotide bases at the polymorphic site is a mass modifiednucleotide (T^(mod)). These fragments are sufficient for massspectrometric analysis of the modified polymorphic nucleotide.

[0105]FIG. 8 Shown in this figure is the incorporation of a singlerestriction enzyme recognition site in the amplicon for subsequentdigestion and mass spectrometric analysis of the prepared fragments.Shown in this figure is incorporation of BcgI, an restriction enzymethat is capable of making two double strand cuts, one on the 5′ side andone on the 3′ side of their recognition site. The recognition site forBcgI is 12/10(N)CGA(N)₆TGC(N)12/10, which after digestion results infragments sufficient for mass spectrometric analysis and identificationof the polymorphic base with the fragment.

[0106]FIG. 9 Shown in this figure is an example of the utility in thepresent invention of including a restriction enzyme recognition site forwhich the restriction enzyme creates a nick in the DNA amplicon insteadof causing a double strand break. As shown in this figure, a primer R isdesigned to incorporate a N.BstNB I recognition site(GAGTCNNNN{circumflex over ( )}NN) in addition to a FokI restrictionsite. As in previous figures, the primer forms a hair-pin loop structurewhen hybridized to the target DNA region, however, the PCR amplicon hasthe incorporated restriction site sequences. Digestion with FokI andN.BstNB I results in a 10 mer fragment that contains the polymorphicbase (T in italic). Such a fragment is sufficient for analysis using amass spectrometer.

[0107]FIG. 10 Shown in this figure is a similar strategy to the nickingenzyme scheme of FIG. 9, above. In this method, one restriction enzymeand a primer which contains a ribonucleotide substitution for one of thedeoxyribonucleotides. As shown the primer is designed to contain a FokIrecognition site which upon hybridization with the target DNA sequenceforms a hair-in loop. The primer also has a ribonucleoside (rG)substitution which will additionally be incorporated into the amplicon.The ribonucleoside substitution is base-labile and will cause a break inthe backbone of the DNA at that site under basic conditions. Shown inthis scheme, the amplicon is incubated with the restriction enzyme (FokI) causing a double-strand break. The amplicon is then incubated in thepresence of base causing a break between the ribonucleotide G and the 3′deoxyribonucleotide T, releasing a 7 base fragment which can easilyanalyzed by mass spectrometry.

[0108]FIG. 11 The diagram illustrates the major approaches tohaplotyping within the allele separation group of allele enrichmentmethods, described in section A of the specification. As shown, methodscan be broadly categorized as (1) those directed to single stranded DNAand (2) those directed to double stranded DNA. It is possible to captureDNA fragments in an allele specific manner by affinity to proteins ornucleic acids that discriminate single base differences. Different typesof protein and nucleic acid affinity reagents are shown in the boxes.The protein or nucleic acid that sticks to one allele can subsequentlybe selected from the nucleic acid mixture by methods known in the artsuch as streptavidin or antibody coated beads. A third, non-affinitybased method for separating alleles involves restriction endonucleasecleavage at a polymorphic site (such that fragments of significantlydifferent size are produced from the two alleles), and subsequent sizefractionation of the cleaved products using electrophoresis orcentrifugation. Genotyping the isolated fragments corresponding to thetwo alleles will provide haplotypes.

[0109]FIG. 12 This diagram depicts the various methods one skilled inthe art could employ for haplotyping based on allele-specificamplification. After cleavage of one allele the other allele may beselectively amplified, or separated by a size selection procedure, orthe cleaved allele may be removed by an allele selective degradationprocedure.

[0110]FIG. 13 This diagram depicts the categorization of the variousmethods one could employ haplotyping strategies based upon allelespecific restriction. In these methods one allele is preferentiallyamplified from a mixture of two alleles by the design of a primer orprimers that exploit sequence differences at polymorphic sites.

[0111]FIG. 14 Hair pin loop primers. In this figure the primers used forPCR amplification is shown. In allele 1, the polymorphic site is a T(italic) and incorporation of the ATCTGGA 5′ portion of the primeroccurs after at least one round of amplification. In allele 2, thepolymorphic site is a C (italic) and incorporation of the ATCTGGA 5′portion of the primer occurs at least after one round of amplification.

[0112]FIG. 15 Hair pin loop primers. In this figure the primers used forPCR amplification is shown. In allele 1, the polymorphic site is a C(italic) and incorporation of the ATCCGGA 5′ portion of the primeroccurs after at least one round of amplification. In allele 2, thepolymorphic site is a T (italic) and incorporation of the ATCCGGA 5′portion of the primer occurs at least after one round of amplification.

[0113]FIG. 16 Hair pin loop primers. In this figure, the minus strand ofallele 1 generated by the PCR amplification step shown in FIG. 14depicts the inability of the 5′ primer to hybridize and effectivelyprevents the amplification of allele 1, using the T primer.Alternatively, the minus strand of allele 2 is incapable of forming ahairpin loop due to the mismatch. Thus, hairpin loop formation andprevention of PCR amplification does not occur, and amplification thisallele 2 strand will occur using the T primer.

[0114]FIG. 17 Hair pin loop primers. In this figure, the minus strand ofallele 2 generated by the PCR amplification step shown in FIG. 15depicts the inability of the 5′ primer to hybridize and effectivelyprevents the amplification of allele 2, using the C primer.Alternatively, the minus strand of allele 1 is incapable of forming ahairpin loop due to the mismatch. Thus, hairpin loop formation andprevention of PCR amplification does not occur, and amplification theallele 1 strand will occur using the C primer.

[0115]FIG. 18 Exonuclease based methods for the determination of ahaplotype. In the DNA segment to be haplotyped, one identified site ofpolymorphism is a RFLP, so that on one allele the restriction enzyme, asshown as BamHI is able to digest the alleles and generate differentlength fragments.

[0116]FIG. 19 Exonuclease based method for the determination of ahaplotype. Using the fragments as shown and described in FIG. 16, theends of the DNA fragments are protected from exonuclease digestion. Theprotected fragments are then digested with a second restriction enzymefor whose recognition site is located in one of the fragments, but notthe other, due to the overhang of the RFLP, as shown, a NheI site.Restriction digestion of the fragments with NheI will effectivelyshorten the BamHI fragment but additionally remove the protection fromthe exonuclease digestion.

[0117]FIG. 20 Endonuclease based method for the determination of ahaplotype. Using the fragments generated as shown in FIG. 17, thesefragments are then incubated in the presence of an exonuclease. As shownthe exonuclease will digest one of the fragments but the protectedfragments will remain undigested.

[0118]FIG. 21 Primer mediated inhibition of allele-specific PCRamplification. Primers with the above characteristics were designed forhaplotyping of the dihydropyrimidine dehydrogenase (DPD) gene. The DPDgene has two sites of variance in the coding region at base 186 (T:C)and 597 (A:G) which result in amino acid changes of Cys:Arg and Met:Val,respectively as shown in the box of FIG. 27. The second site at base 597is a restriction fragment length polymorphism (RFLP) which cleaves withthe enzyme BsrD I if the A allele is present. The expected fragments areas shown in the figure.

[0119]FIG. 22 Allele specific primers for the DPD gene. In A., threeprimers were designed which contain at least two different regions. The3′ portion of the primer corresponds to the template DNA to beamplified. For the DPDASCF and the DPDASTF primers additionalnucleotides were added to the 5′ end of the primer which arecomplementary to the region in the sequence which contains thenucleotide variance. The DPDNSF primer contains only the DPDcomplementary sequence and will not result in allele specificamplification. In B., the DPD gene sequence containing the site ofpolymorphism is shown.

[0120]FIG. 23 PCR amplification of the DPD gene using the DPDNSF primer.Shown is the hybridization of the DPDNSF primers to the templatecontaining the T or C allele. Below, the expected products for the DPDgene region using the DPDNSF primer for the T or C allele as shown.

[0121]FIG. 24 PCR amplification of the DPD gene using the DPDASTFprimer. Shown is the hybridization of the DPDASTF primers to thetemplate containing the T or C allele. Below, the expected products forthe DPD gene region using the DPDASTF primer for the T or C allele asshown.

[0122]FIG. 25 PCR amplification of the DPD gene using the DPDASCFprimer. Shown is the hybridization of the DPDASCF primers to thetemplate containing the T or C allele. Below, the expected products forthe DPD gene region using the DPDASCF primer for the T or C allele asshown.

[0123]FIG. 26 Stable hairpin loop structures formed with the reversestrand of the PCR product made using the DPDNSF primer using thecomputer program Oligo4. Only the reverse strand is shown because thiswould be the strand to which the DPDNSF primer would hybridize onsubsequent rounds of amplification. The hairpin loops are either notstable or have a low melting temperature.

[0124]FIG. 27 Stable hairpin loop structures formed with the reversestrand of the PCR product made using the DPDASCF primer using thecomputer program Oligo4. As in FIG. 32, only the reverse strand isshown.

[0125]FIG. 28 Stable hairpin loop structures formed with the reversestrand of the PCR product made using the DPDASTF primer using thecomputer program Oligo4. As in FIG. 32, only the reverse strand isshown.

[0126]FIG. 29 The primer hybridization and amplification events whenfurther amplification using the DPDNSF primer is attempted on thegenerated PCR fragments. The DPDNSF primer is able to effectivelycompete with the hairpin structures formed with both the T and C alleleof the DPD gene and thus amplification of both alleles proceedsefficiently.

[0127]FIG. 30 The primer hybridization and amplification events whenfurther amplification using the DPDASCF primer is attempted on thegenerated PCR fragments. The DPDASCF primer is able to compete forhybridization with the hairpin loop formed with the C allele because itsmelting temperature is higher than the hairpin loop's (60° C. comparedto 42° C.). The hairpin loop formed on the T allele however, has ahigher melting temperature than the primer and thus effectively competeswith the primer for hybridization. The hairpin loop inhibits PCRamplification of the T allele which results in allele specificamplification of the C allele.

[0128]FIG. 31 The primer hybridization and amplification events whenfurther amplification using the DPDASTF primer is attempted on thegenerated PCR fragments. The hairpin loop structure has a higher meltingtemperature than the primer for the C allele and a lower meltingtemperature than the primer for the T allele. This causes inhibition ofprimer hybridization and elongation on the C allele and results inallele specific amplification of the T allele.

[0129]FIG. 32 The ability to use the hair-pin loop formation forhaplotyping the DPD gene is diagrammed. Using a cDNA sample whosehaplotype is know to be: Allele 1—T¹⁸⁶:A⁵⁹⁷ Allele 2—C¹⁸⁶:G⁵⁹⁷. The sizeof the fragments generated by a BsrD I from a 597 bp sequence generatedby amplification with the primers DPDNSF, DPDASTF, and DPDASCF, dependon whether the base at site 597 is an A or a G. Restriction digestion byBsrD I is indicative of the A base being at site 597. If a fragment hasthe A base at 597, three fragments will be generated of lengths 138, 164and 267 bp. If the G base is at site 597 only two fragments will begenerated of lengths 164 and 405 bp. If a sample is heterozygous for Aand G at site 597, generation of all four bands of 138, 164 (2×), 267and 405 bp will occur. The expected fragments generated by BsrD Irestriction for each of the primers is indicated in the box.

[0130]FIG. 33 Agarose gel electrophoresis of the fragments generated byamplification of each of the primers for the DPD gene in a cDNA sampleheterozygous at both sites 186 and 597 followed by BsrD I restriction.The DPDNSF lane shows the restriction fragment pattern for the selectedcDNA using the DPDNSF primer indicating that this sample is indeedheterozygous at site 597. However, using the same cDNA sample and theprimer DPDASTF (DPDASTF lane), the restriction pattern correlates to thepattern representative of a sample which is homozygous for A at site597. Because the DPDASTF primer allows amplification of only the Tallele, the haplotype for that in the sample must be T¹⁸⁶:A⁵⁹⁷. Therestriction digest pattern using the primer DPDASCF (DPDASCF lane)correlates with the expected pattern for there being G at site 597.Amplification of the cDNA sample with the primer DPDASCF results inamplification of only the C allele in the sample. Thus the haplotype forthis allele must be C¹⁸⁶:G⁵⁹⁷.

[0131]FIG. 34 Genotyping of the variance at genomic site 21250 in theApoE gene. At this genomic site a T:C variance in the DNA results in acysteine to argininine amino acid change in amino acid position 176 inthe ApoE protein. Two primers were designed to both amplify the targetregion of the ApoE gene and to introduce two restriction enzyme sites(Fok I, Fsp I) into the amplicon adjacent to the site of variance. Thisfigure depicts the sequence of the primers and the target DNA. TheApo21250-LFR primer is the loop primer which contains the restrictionenzyme recognition sites and the ApoE21250-LR primer is the reverseprimer used in the PCR amplification process. The polymorphic nucleotideis shown in italics.

[0132]FIG. 35 The sequence of the amplicon for both the T allele and theC allele of the ApoE gene following amplification is shown. Thepolymorphic site is shown as an italic T or italic C.

[0133] 1. Genotyping Methods

[0134] This application concerns methods for determining the sequence ofa DNA sample at a polymorphic site, often referred to as genotyping.Many genotyping methods are known in the art, however the methoddescribed in this application has the advantages of being robust, highlyaccurate, and inexpensive to set up and perform. For these reasons thenovel methods described herein are preferable to most currentlyavailable methods.

[0135] The disadvantages of existing genotyping methods include:unproven or inadequate accuracy (particularly for medical research,where very high accuracy is required); high set up costs (which areunacceptable when relatively small numbers of subjects are beingstudied); technical difficulty in performing the test or interpretingthe results; and lack of automatability.

[0136] The present invention describes a genotyping method based on massspectrometric analysis of small DNA fragment(s) (preferably <25 bases)containing a polymorphic base. The small size of the DNA fragmentsgenerated allows them to be efficiently analyzed via mass spectrometryto determine the identity of the nucleotide at the polymorphic site. Thegeneration of appropriate DNA fragments preferably falls in the rangebetween 9,000 Daltons (30-mer) and about 900 Daltons (3-mer), or between900 and 7500 Daltons (25-mer), or between 900 and 6000 Daltons (20-mer),or between 900 and 4500 Daltons (15-mer). However, as mass spectrometrytechnology progresses it will become possible to genotype DNA fragmentsoutside this currently recommended range, so greater ranges are alsoincluded in preferred embodiments, e.g., 900 to 9600 Daltons (32-mer),or 900 to 10500 Daltons (35-mer), or 900 to 12000 Daltons (40-mer). Thusthe methods described herein are tailored to the capabilities ofpresently available commercial mass spectrometers, however, one skilledin the art will recognize that these methods can be adapted with ease toimprovements in mass spectrometry equipment, including, for example,MALDI instruments with improved desorbtion, delayed extraction ordetection devices.

[0137] The invention entails use of a single modified primer in a primerextension or amplification reaction. The modified primer is designed soas to introduce at least two restriction endonuclease recognition sitesinto the sequence of the primer extension product, which is preferablyan amplicon in an amplification reaction. The restriction endonucleaserecognition sites are designed such that they surround and/or span thepolymorphic base to be genotyped and will liberate a small DNAfragment(s) containing the polymorphic base upon cleavage. If thenatural sequence adjacent to the polymorphic site (either on the 5′ sideor the 3′ side) already contains a restriction endonuclease recognitionsite then it may be possible to design the modified primer so that oneof the two restriction cleavage sites is not engineered into the primer(see below), but rather occurs naturally in the amplicon. In this eventonly one restriction site has to be engineered into the primer.

[0138] One embodiment of the invention involves the introduction of tworestriction enzyme sites into the sequence of an amplicon in thevicinity of a polymorphic site during amplification. The two restrictionenzyme sites are selected so that when the amplicon is incubated withthe corresponding restriction enzymes, two small DNA fragments aregenerated, at least one of which contains the polymorphic nucleotide.The restriction enzyme sites are introduced during the amplificationprocess by designing a primer that contains recognition sites for tworestriction endonucleases. Two different methods for designing suchprimers are described below, but any strategy in which at least twocleavable sites are introduced into an amplicon using a single primerwould be effective for this method.

[0139] One method involves the selected alteration of bases in theprimer (relative to what they would be if the primer were to base pairperfectly with the natural sequence) so as to introduce restrictionenzyme sites. An example of such a primer, incorporating recognitionsites for the restriction enzymes Fok I and Fsp I, is shown in FIG. 1.The recognition sites and cleavage sites for Fok I and Fsp I aredepicted in FIG. 2. Fok I is a type IIS restriction enzyme which cleavesDNA outside the recognition sequence—at a distance of 9 bases 3′ to therecognition site on one strand and 13 bases away from the recognitionsite on the opposite strand, leaving a four base overhang (protruding 5′end) (FIG. 2). By designing the primer so that the Fok I recognitionsite is located within 12 bases or less of the 3′ end of the primer onecan assure that the Fok I cleavage will cleave outside the primersequence and incorporate the polymorphic nucleotide for analysis. Fsp Iis a useful enzyme to pair with Fok I because its recognition siteoverlaps that of Fok I, allowing the two sites to be partially combined(FIG. 3). This reduces the number of bases that are be introduced intothe modified primer, making the primer design simpler and more likely towork for amplification.

[0140] A primer is designed (primer R) in which some of the bases arechanged from the target sequence. The bases that are changed areindicated by arrows above primer R. This primer along with a second(normal) amplification primer designed in the reverse direction are usedto amplify the target sequence. The polymorphic base (T in the forwarddirection, A in the reverse direction) is indicated in italics and by anarrow below the target sequence. During the amplification, the tworestriction enzyme sites are incorporated into the sequence of theamplicon. The incorporated Fok I/Fsp I site is surrounded by the box inFIG. 1. When the amplicon is incubated with Fok I and Fsp 1, cleavageoccurs at the both sites releasing an 8-mer fragment and a 12-merfragment. The 12-mer fragment contains the polymorphic base (A). Thesefragments are then analyzed by the mass spectrometer to determine thebase identity at the polymorphic site in the 12-mer.

[0141] The second method of primer design involves the use of a primerwith an internal loop. The primer is designed (primer R1, FIG. 4) suchthat one of the bases corresponding to the native sequence is removedand replaced with a loop. In this case the G/C indicated by the arrowbelow the target sequence (FIG. 4) is replaced with the recognitionsequence for Fok I and Fsp I. Upon hybridization to the DNA template,the primer will form a loop structure. This loop will be incorporatedinto the amplicon during the amplification process, thereby introducingthe Fok I and Fsp I restriction sites (indicated by the box in FIG. 4).When the amplicon is incubated with Fok I and Fsp I, cleavage will occurreleasing an 8-mer and a 12-mer. As in the example in FIG. 1, the 12-mercontains the polymorphic base and can be analyzed by mass spectrometryto identify the base at the polymoporphic site.

[0142] Both strategies result in an amplicon which can be cleaved withFok I and Fsp I to liberate small DNA fragments in which the polymorphicnucleotide is contained in one of the fragments. The loop strategy (FIG.4) is the preferred method because primer design is easier and moreflexible.

[0143] There are other possible restriction enzyme combinations thatalso meet the requirements for the generation of appropriate DNAfragments for genotyping by mass spectrometry. Two other examples areoutlined in FIGS. 5 and 6. The only requirements for primer design arethat the restriction enzyme site(s) will generate a fragment(s) that issmall enough to be easily analyzed by a mass spectrometer, and containthe polymorphic site. It is also a requirement that the introduction ofthe restriction enzyme site(s) into the primer does not eliminate theability of the primer to generate an amplicon for the correct region ofthe target DNA. It does not matter whether the cleavage site for bothenzymes generates a staggered 5′ overhang, 3′ overhang, or a blunt end.

[0144] In another embodiment it may be desirable to generate a cleavageproduct in which there is a 5′ overhang such as the case with the Fok Iand Fsp I example shown in FIG. 4. Following an amplification reaction(in which the Fok I and Fsp I sites have been incorporated into theamplicon—see sequence in box FIG. 7), remaining nucleotides are removedusing any of a variety of methods known in the art, such as spinningthrough a size exclusion column such as Sephadex G50 or by incubatingwith an alkaline phosphatase, e.g., shrimp alkaline phosphatase. Theamplicon is then cleaved with the restriction enzyme (Fok I) whichgenerates the 5′ overhang including the polymorphic base. This recessedend can then be filled in with nucleotides in which the normalnucleotide corresponding to one of the possible nucleotide bases at thepolymorphic site is a mass modified nucleotide (T^(mod) in FIG. 7). Themass modified nucleotide has a mass that is different from the normalnucleotides in a way that increases the difference in mass normally seenbetween normal nucleotides. An example of such a nucleotide isbromo-deoxyuridine (BrdU) which is 64.8 Daltons higher in mass thandTTP. Table 1 lists the masses of the normal nucleotides and BrdU andthe mass differences between each of the possible pairs of nucleotides.As is evident from the table, mass modified nucleotides allow a greaterseparation in mass between the fragments, making analysis, especially inan automated mode, easier. After fill-in of the recessed ends of thefragment, digestion with FspI would then allow for the generation of afragment amenable for mass spectrometric analysis and identification ofthe polymorphism of interest. An advantage of this method, from theothers in the present invention, allows for one to use either massspectrometry or readily available electrophoretic detection methods. Ina preferred embodiment, after digestion with FokI, and overhang fill-inreaction that includes a modified nucleotide representing one of thesuspected polymorphic bases, the fragments analyzed by electrophoreticmobility would migrate differently due to the incorporation of the massmodified nucleotide. Thus, one could identify suspected polymorphicsites in the prepared unlabeled fragments by either mass spectrometricor electrophoretic methods.

[0145] Alternatively, in the above described method employing a massmodified nucleotide recessed end fill-in, a labeled primer (radioactiveor fluorescent label) during the PCR reaction would result in adetectable signal if the samples were then subjected to electrophoreticseparation. In this case, a target DNA sample is amplified using asimilar scheme to the one described above; a 5′ labeled primer with aFokI restriction site is allowed to hybridize to the target DNA forminga hair-pin loop, and subsequent amplification incorporates the FokI siteinto the amplicon. The resultant amplicon is subjected to digestion withFokI to separate the sequence 3′ of the site of polymophism and theresidual nucleotides from the PCR reaction are removed as describedabove. The overhang sequence then is filled in with a polymerase in thepresence of natural nucleotides with one of the nucleotides of thepolymorphic site being a dideoxynucleotide, or chain terminatingnucleotide. Thus, differential fill-in of the overhang will be dependenton the presence or absence of the polymorphism and thus incorporation ofa dideoxy terminating nucleotide. In preferred embodiments, the primeris not labeled but the dideoxy chain terminating nucleotide representingone of the suspected polymorphic bases is labeled to be able to detectthose fragments. In further preferred embodiments, each polymorphic basedideoxynucleotide is labeled with uniquely detectable labels and theidentification of the polymorphic site is based upon presence of onesignal and absence of another in the cases of homozygotes or thepresence of both signals in the cases of heterozygotes.

[0146] It may also only be necessary to incorporate one restrictionenzyme site into the amplicon via the primer. This can be done if theenzyme utilized is capable of making two double strand cuts, one on the5′ side and one on the 3′ side of the recognition site. An example ofsuch an enzyme is Bcg I which has a recognition site of12/10(N)CGA(N)₆TGC(N)12/10 (FIG. 8). The arrows designate the sites ofcleavage on both strands. This particular enzyme would generatefragments greater than the current optimal length for mass specanalysis, so similar enzymes that are capable of cleaving in a similarfashion but which would generate smaller fragments are more desirable.Also, as mass spectrometry techniques and instrumentation for DNAanalysis progress, it may be possible to reliably analyze DNA fragmentsof this length or greater obtaining the sensitivity and the resolutionnecessary to see single base differences in fragments of this length.

[0147] Restriction enzymes can also be used which only nick the DNAinstead of causing a double strand break. One such enzyme is N.BstNB Iwhose recognition site is GAGTCNNNN{circumflex over ( )}NN The fragmentsgenerated by this scheme are outlined in FIG. 9. This strategy wouldgenerate only one small fragment (10-mer in this case) instead of twowhich may make analysis less complicated, especially in an automatedmode.

[0148] A similar strategy to the nicking enzyme above can beaccomplished using one restriction enzyme and a primer which contains amodification allowing the primer to be cleaved. An example of such ascheme outlined in FIG. 10 is where one of the deoxyribonucleosides inthe primer is substituted with a ribonucleoside (rG). The ribonucleosideis base-labile and will cause a break in the backbone of the DNA at thatsite. In this example, the amplicon is incubated with the restrictionenzyme (Fok I) causing a double-strand break. The amplicon is thenincubated in the presence of base causing a break between theribonucleotide G and the 3′ deoxyribonucleotide T, releasing a 7 basefragment which can easily analyzed by mass spectrometry.

[0149] 2. Haplotyping Methods

[0150] Background

[0151] In mammals, as in many other organisms, there are two copies(alleles) of each gene in every cell (except some genes which map to thesex chromosomes—X and Y in man). One allele is inherited from eachparent. The purpose of the haplotyping methods described in thisapplication is to determine the sequence of the two alleles in a givensubject. In general the two alleles in any organism are substantiallysimilar in sequence, with polymorphic sites occuring less than every 100nucleotides, and in some cases in less than every 1,000 nucleotides.Determination of the sequence of the non-variant nucleotide positions isnot relevant to haplotyping. Thus, the problem of haplotyping comes downto determining the nucleotide sequence on each of the two alleles at thepolymorphic sites: For a subject that is heterozygous at two sites,where polymorphic site #1 is A or C, and polymorphic site #2 is G or T,we wish to know if the alleles are A-G and C-T, or if they are A-T andC-G. When DNA is extracted from a diploid organism the two alleles aremixed together in the same test tube at a 1:1 ratio. Thus DNA analysisprocedures performed on total genomic DNA, such as DNA sequencing orstandard genotyping procedures which query the status of polymorphicsites one at a time, do not provide information required to determinehaplotypes from DNA samples that are heterozygous at two or more sites.

[0152] The determination of haplotypes is particularly useful forgenetic analysis when the DNA segment being haplotyped consists ofpolymorphisms that are in some degree of linkage disequilibrium witheach other—that is, they do not assort randomly in the population beingstudied. In general, linkage disequilibrium breaks down with increasingphysical distance in the genome, however the distance over which linkagedisequilibrium is maintained varies widely in different areas of thegenome. Thus the length of DNA over which an ideal haplotyping procedureshould operate will differ from one gene to another. In general,however, it is desirable to determine haplotypes over distances of atleast 2 kb; more preferably at least 5 kb; still more preferably atleast 10 kb and most preferably at least 20 kb. Procedures fordetermining extended haplotypes (i.e. haplotypes >10 kb in length) areemphasized in this application, however, in many cases haplotypesspanning shorter distances may be completely acceptable and may captureall or virtually all of the biologically relevant variation in a largerregion of DNA.

[0153] In genes that consist of two or more DNA segments that are not inlinkage disequilibrium, due to the intervening presence of DNA regionssubject to a high frequency of recombination, the preferred approach tohaplotype determination is to separately determine haplotypes in each ofthe two or more constituent regions. The subsequent genetic analysis ofgenotype-phenotype relationships entails the consideration of all thehaplotype groups that exist among the two or more haplotyped segments.Consider, for example, a 15 kb DNA segment in which there is a highfrequency of recombination in a central 3 kb segment, but substantiallinkage disequilibrium in two flanking 6 kb segments, A and B. Thehaplotype analysis strategy might consist of determining all the commonhaplotypes (or haplotype groups—see below) in each of the two 6 kbsegments, then considering all the possible combinations of A and Bhaplotypes. For example if there are three haplotypes or haplotypegroups at A (a, a′ and a″) and four at B (b, b′, b″, b′″) then all thecombinations (a:b, a:b′, a:b″, a:b′″, a′:b, a′:b′, a′: b″, a′:b′″, etc.)that occur at, say, a frequency of 5% or greater would be analyzed withrespect to relevant phenotypes.

[0154] Four Approaches to Haplotyping

[0155] Three groups of haplotyping methods based on allele enrichmentare described in this application, plus a fourth group of methods basedon visualizing single DNA molecules optically. The usual startingmaterial for all these haplotyping methods is total genomic DNA. In somecases total cellular RNA (or cDNA) may be the starting material. (RNA orcDNA-based methods are predicated on the assumption that both alleles ofa gene are transcribed equally; this assumption does not always hold,therefore it should be tested experimentally in any case where cDNA isbeing considered as the starting material for a genotyping orhaplotyping procedure.)

[0156] Each of the three families of allele enrichment methodspreferably involves three steps. The first step is to determine thegenotype of at least one polymorphic site in the starting genomic DNA toprovide the basis for design of an allele enrichment procedure.Preferably two or more polymorphic sites are genotyped, and mostpreferably all polymorphic sites in the DNA segment of interest aregenotyped. If two or more sites are heterozygous at the DNA locus ofinterest then the sample must be subjected to a haplotyping method todetermine the haplotypes. The second step entails obtaining, from asample of genomic DNA (or RNA or cDNA) containing two alleles of a geneor other DNA segment of interest, a population of DNA molecules enrichedfor only one allele. This can be accomplished using any of a variety ofnovel methods described herein. The third step is a genotyping procedureperformed on the enriched DNA. The genotyping procedure will revealthat, at each site which is heterozygous in the subject's genomic DNA(as determined in the first step), only one allele is present in theenriched material. Alternatively, the allele ratio in the enrichedmaterial is sufficiently imbalanced, compared to the 1:1 allele ratio ingenomic DNA, that the enriched allele can be identified with certainty.In either event, the nucleotides that must be present on thenon-enriched allele can be deduced by “subtracting” the haplotype of theenriched allele from the genotype of the starting DNA, determined instep 1. For example, for a DNA segment that is heterozygous at threesites, where site 1 has A or T, site 2 has C or T and site 3 has A or G,if a first haplotype is: 1=A, 2=T, 3=A, then the other haplotype mustbe: 1=T, 2=C, 3=G. However, rather than simply deducing the secondhaplotype, a preferred method for haplotype analysis entails theindependent determination of both haplotypes present in a sample—byenriching and subsequently genotyping each of the two alleles present ina sample in separate experiments; they should collectively account forthe genotype determined from the DNA sample in step one. This practiceincreases the accuracy of the haplotyping methods described herein.

[0157] Step 1 of the procedure described above is mostly dispensable; itis possible to proceed directly to DNA strand enrichment knowing thelocation of only one polymorphic site (which will provide the basis fordesigning an enrichment procedure for one allele). Virtually anygenotyping procedure will work in step three, however, because alleleenrichment is generally not complete, quantitative or semi-quantitativegenotyping methods are preferred. Good quantitative genotyping methodswill permit accurate haplotypes to be determined even when the degree ofallele enrichment is only 2:1, or even less. On the other hand, ifsubstantial allele enrichment is achieved in step two then thegenotyping procedure of step three may consist of performing DNAsequencing, reactions on the enriched material. For example, chainterminating DNA sequencing reactions could be used to deter-mine thehaplotype of the enriched DNA.

[0158] A fourth group of haplotyping methods involves microscopicvisualisation of single DNA molecules that have been treated in a mannerthat produces allele specific changes at polymorphic sites. Thesehaplotyping methods are based on the optical mapping and sequencingmethods of D. Schwartz, described in U.S. Pat. No. 5,720,928. They aredescribed separately in section 6 below.

[0159] Three Groups of Allele Enrichment Haplotyping Methods

[0160] The three groups of haplotyping methods dependent on alleleenrichment differ in respect to the procedure used to obtain apopulation of DNA molecules enriched for one of two alleles present in astarting sample. The three different enrichment methods entail: (i)allele capture strategies (summarized in FIG. 11 and described in detailbelow in section 3), applicable to double or single stranded DNA, inwhich an easily isolated material is linked preferentially to oneallele; (ii) allele-selective DNA cleavage strategies, which exploitallele differences in the presence of restriction enzyme cleavage sites.After cleavage of one allele the other allele may be selectivelyamplified, or separated by a size selection procedure, or the cleavedallele may be removed by an allele selective degradation procedure(summarized in FIG. 12 and described below in section 4); and (iii)selective amplification strategies, in which one allele ispreferentially amplified from a mixture of two alleles by the design ofa primer or primers that exploit sequence differences at polymorphicsites (summarized in FIG. 13 and described below in section 5).

[0161] Degree of Allele Enrichment Required for Haplotyping

[0162] Strand enrichment by any of the above methods need not bequantitative or completely selective in order to produce an accurate andreproducible haplotyping result. Even if both alleles are still presentafter enrichment, as long as one allele is consistently present ingreater amount than the other, the enrichment may be adequate to producea satisfactory discrimination between alleles in a subsequent genotypingtest. Preferably the degree of strand enrichment is at least 1.5-fold,more preferably two-fold, more preferably at least four-fold, still morepreferably at least six-fold, and most preferably at least ten-fold.Further enrichment beyond 10-fold is desirable, but is unlikely toproduce significant changes in the accuracy of the haplotyping test. Theadequacy of haplotype determination using a DNA population that is onlypartially enriched for the desired allele can be determined by repeatedanalyses of known samples to determine the error rate associated withdifferent known allele ratios.

[0163] Yield of Enriched Alleles Required for Haplotyping

[0164] After allele enrichment, one has a population of DNA moleculesfor genotyping analysis that is necessarily less than the startingnumber of DNA molecules because no enrichment procedure will permit 100%recovery of the selected allele. However, just as a high degree ofallele selectivity is not necessary during enrichment, a high yield ofthe enriched allele is not necessary either. The amount of enrichedallele will of course depend in part on the quantity of starting DNA.Thus, in a haplotyping experiment that starts with one microgram ofgenomic DNA, only a small fraction of the alleles in the startingmaterial—as little as 0.1%—have to be captured by the allele enrichmentprocedure, provided the subsequent genotyping step (usually PCR based)is sensitive enough to amplify an amount of template (˜300 copies) thatwould normally be found in 1 ng of genomic DNA. If necessary the PCRamplification step of the genotyping procedure can be modified toincrease sensitivity using methods known in the art, such as nested PCR(two rounds of PCR, first with an outside set of primers, then with aninside set) or an increased number of PCR cycles. Also, to compensatefor a low efficiency of captured alleles the quantity of input genomicDNA or cDNA can be increased to 2 ug, 4 ug or even 10 ug or more.Preferably the fraction of input alleles that are captured by theenrichment procedure is at least 0.01% of the starting number ofalleles, more preferably at least 0.05%, still more preferably at least0.25%, still more preferably at least 2% and most preferably at least10%. The capture of a still higher fraction of the input alleles doesnot contribute significantly to the performance of the procedure, and infact is undesireable if it compromises the selectivity of strandenrichment.

[0165] Controlling the Size of DNA Molecules to be Haplotyped

[0166] Before performing allele enrichment procedures on DNA fragmentsit may be desirable to control the size of the input DNA by random orspecific cleavage procedures. One reason is that very long DNA fragmentsmay be significantly more difficult to selectively enrich than shorterfragments (due, for example, to a greater tendency for shear forces tobreak long fragments, or a greater tendency for long fragments to adhereto or be trapped by particles or matrices required for separation).Therefore it is preferable to produce DNA fragments that are onlymoderately longer than the size of the region to be haplotyped (which isdetermined by the biological problem being analyzed, and the locationand relationship of DNA polymorphisms, including the degree of linkagedisequilibrium in the region being analyzed; see discussion above). TheDNA segment ot be haplotyped may include a gene, part of a gene, a generegulatory region such as a promoter, enhancer or silencer element, orany other DNA segment considered likely to play a role in a biologicalphenomenon of interest.

[0167] Production of DNA fragments in the desired size range can beaccomplished by using random fragmentation procedures (e.g. shearing DNAphysically by pipetting, stirring or by use of a nebulizer), by partialor complete restriction endonuclease digestion, or by controlledexposure to a DNAase such as E. coli DNAase I.

[0168] With random or semi-random DNA fragmentation procedures, such aspartial nuclease digestion, the aim is to produce a collection of DNAfragments, most of which span the entire region to be haplotyped (andthat contain the site that will be used to effect allele enrichment).Mathematical methods can be used to determine the optimal sizedistribution—for example, a size distribution may be selected in which80% of the fragments span the target region, assuming randomdistribution of DNA breakpoints. Preferably at least 50% of the DNAfragments are in this size range.

[0169] Complete restriction endonuclease digestion is another useful wayto control the size of input DNA molecules, particularly when the fullDNA sequence or the restriction map of the DNA segment to be haplotypedis known. Restriction digestion with enzymes that cleave DNA atpolymorphic sites produces restriction fragments of different lengthsfrom different alleles (so called restriction fragment lengthpolymorphisms, or RFLPs). Cleaving at restriction sites that produceRFLPs can be used to produce DNA molecules that do or do not containbinding sites for DNA binding molecules (e.g. DNA binding proteins,oligonucleotides, PNAs or small molecules that bind DNA) such that onlyone of two alleles in a genomic DNA sample contains the binding site. Inorder for this approach to work the location of all binding sites forthe allele specific DNA binding molecule must be taken into account. Thepreparation of DNA molecules for haplotyping by specific DNA cleavagecan be performed so as to produce molecules that will perform optimallyin the allele specific binding step.

[0170] If single stranded DNA is to be the input material forhaplotyping then preferably the optimal size distribution of DNAmolecules is obtained while DNA is still double stranded, using any ofthe methods described above. Subsequently the sample can be denatured,subjected to an allele enrichment step, and subsequently genotyped todetermine the haplotypes.

[0171] 3. Haplotyping: Double and Single Strand-Based Allele EnrichmentMethods

[0172] 3.1 Introduction

[0173] The goal of allele selection methods is to physically fractionatea genomic DNA sample (the starting material) so as to obtain apopulation of molecules enriched for one allele of the DNA segment orsegments to be analyzed. The details of the procedure depend on thepolymorphic nucleotide(s) that provide the basis for allele enrichmentand the immediate flanking sequence upstream and/or downstream of thepolymorphic site. As explained below, different types of sequencepolymorphisms lend themselves to different types of allele enrichmentmethods.

[0174] Once a polymorphic site is selected for allele enrichment theenrichment steps are as follows: (i) prepare DNA fragments for alleleenrichment; (ii) add to the DNA fragments a molecule that binds DNA in asequence specific manner (hereafter referred to as the ‘DNA bindingmolecule’) such that one allele of the target DNA segment will be boundand the other not; (iii) allow complexes to form between DNA fragmentsand the allele specific DNA binding molecule under conditions optimizedfor allele selective binding; (iv) add a second reagent, such as anantibody, that binds to the allele specific DNA binding molecule (whichin turn is bound to DNA fragments, including fragments comprising theselected allele); (v) remove the complex consisting of {selectedallele+DNA binding protein+second reagent bound to DNA binding protein)from the starting DNA sample by either physical, affinity (includingimmunological), chromatographic or other means; (vi) releasing the boundDNA from the complex and (vii) genotyping it to determine the haplotypeof the selected allele. These steps are described in greater detailbelow, except for step (ii), addition of an allele specific DNA bindingmolecule, which is described at greater length in the followingsections, each of which describes one class of allele specific DNAbinding molecules in detail.

[0175] (i) Preparation of DNA fragments for allele enrichment. Thecondition of the DNA may be controlled in any of several ways: DNAconcentration, size distribution, state of the DNA ends (blunt, 3′overhang, 5′ overhang, specific sequence at the end, etc.), degree ofelongation, etc. The DNA is also preferably suspended in a buffer thatmaximizes sequence specific DNA binding. Preferred DNA concentrationsfor these procedures are in the range from 100 nanograms to 10micrograms of genomic DNA in a volume of 10 to 1000 microliters.Preferably lower amounts of DNA and lower volumes are used, in order tocontrol costs and to minimize the amount of blood or tissue that must beobtained from a subject to obtain sufficient DNA for a successfulhaplotyping procedure. As described above, preferably the size of theDNA fragments is controlled to produce a majority of desired fragmentswhich span the DNA segment to be haplotyped. The length of such asegment may vary from 500 nucleotides to 1 kb, 3 kb, 5 kb, 10 kb, 20 kb,50 kb, 100 kb or more. Fragments of the desired size may be produced byrandom or specific DNA cleavage procedures, as described above. Theexploitation of allele specific sequences at the ends of restrictioncleaved DNA molecules for haplotyping is described below. An optimalbuffer and binding conditions provide for maximum discrimination betweenthe binding of the allele specific DNA binding molecule to the selectedallele vs. the non-selected allele. (The binding of the DNA bindingmolecule to many other irrelevant DNA fragments in the genomic DNA isunavoidable but should not interfere with the enrichment of the selectedallele.)

[0176] (ii) Described below are several types of allele specific DNAbinding molecules, including proteins, peptides, oligonucleotides, andsmall molecules as well as combinations of these molecules. Thesemolecules may be designed or selected to bind double stranded (ds) orsingle stranded (ss) DNA in a sequence specific manner

[0177] (iii) Complexes are formed between DNA and the allele specificDNA binding molecule under conditions optimized for bindingspecificity—e.g., conditions of ionic strength, pH, temperature and timethat promote formation of specific complexes between the bindingmolecules and the DNA. Optimization of allele selective bindingconditions will in general be empirical and, in addition to optimizationof salt, pH and temperature may include addition of cofactors. Cofactorsinclude molecules known to affect DNA hybridization properties, such asglycerol, spermidine or tetramethyl ammonium chloride (TMAC), as well asmolecules that exclude water such as dextran sulphate and polyethyleneglycol (PEG). Optimization of temperature may entail use of atemperature gradient, for example ramping temperature from >95° C. downto <40° C.

[0178] (iv) After the selected DNA fragment is bound to an allelespecific DNA binding molecule a second reagent, such as an antibody,aptamer, streptavidin or nickel coated bead or other ligand that bindsto the allele specific DNA binding molecule can be added to the reactionmix. Said second reagent forms complexes with the DNA binding molecules(and any DNA fragments they are bound to) that facilitates their removalfrom the remaining DNA fragments. This step can be omitted if the DNAbinding molecule added in step (ii) already contains or is attached to aligand or a bead or is otherwise modified in a way that facilitatesseparation after formation of allele specific complexes in step (iii).For example, if the DNA binding molecule is a protein it can be modifiedby appending a polyhistidine tag or an epitope for antibody binding suchthe hemaglutinin (HA) epitope of influenza virus. Then, before, duringor after step (iii), nickel coated beads can be added to the DNA sample,or alternatively the sample can be delivered to a nickel column forchromatography, using methods known in the art (e.g. QIAexpress Ni-NTAProtein Purification System, Qiagen, Inc., Valencia, Calif.). First freeDNA is washed through the column, then the DNA bound to the poly-hiscontaining DNA binding protein is eluted with 100-200 mM imidazole usingmethods known in the art. In this way DNA fractions enriched for bothalleles (bound and unbound) are collected from one procedure. Anequivalent procedure for an epitope tagged DNA binding molecule couldinclude addition of antibody coated beads to form {bead-protein-DNA}complexes which could then be removed by a variety of physical methods(see below). Alternatively the protein-DNA complexes could be run overan antibody column (using an antibody that binds to the epitopeengineered into the allele specific DNA binding molecule). An importantconsideration in designing and optimizing a specific allele enrichmentprocedure is that the enrichment conditions are sufficiently mild thatthey do not cause dissociation of the {DNA binding molecule+selectedallele} complexes to an extent that there is too little DNA remaining atthe end of the procedure for robust DNA amplification and genotyping.

[0179] (v) Separation of complexes containing the (DNA bindingmolecule+selected allele}, (plus or minus an optional third moiety boundto the DNA binding protein) from the remainder of the DNA sample may beaccomplished by physical, affinity (including immunological) or othermeans. Preferred methods for removing complexes include application of amagnetic field to remove magnetic beads attached to the selected allelevia the DNA binding molecule or other moiety; centrifugation (e.g. usinga dense bead coated with a ligand like an antibody, nickel, streptavidinor other ligand known in the art, that binds to the DNA bindingmolecule), or filtration (for example using a filter to arrest beadscoated with ligand to which the DNA binding molecule and the attachedDNA fragments are bound, while allowing free DNA molecules to passthrough), or by affinity methods, such as immunological methods (forexample an antibody column that binds the DNA binding molecule which isbound to the selected DNA, or which binds to a ligand which in turn isbound to the DNA binding molecule), or by affinity chromatography (e.g.chromatography over a nickel column if the DNA binding molecule is aprotein that has been modified to include a polyhistidine tag, or if theDNA binding molecule is bound to a second molecule that contains such atag). The separation of the allele specific DNA binding molecule and itsbound DNA from the remaining DNA can be accomplished by any of the aboveor related methods known in the art, many of which are available in kitform from companies such as Qiagen, Novagen, Invitrogen, Stratagene,ProMega, Clontech, Amersham/Pharmacia Biotech, New England Biolabs andothers known to those skilled in the art.

[0180] (vi) Release the bound DNA from the partially purified complexescontaining the selected allele. This step can be accomplished bychemical or thermal denaturing conditions (addition of sodium hydroxide;boiling) or by milder changes in buffer conditions (salt, cofactors)that reduce the affinity of the DNA binding molecule for the selectedallele.

[0181] (vii) Genotyping the enriched DNA to determine the haplotype ofthe selected allele can be accomplished by the genotyping methodsdescribed herein or by other genotyping methods known in the art,including chemical cleavage methods (Nucleave, Variagenics, Cambridge,Mass.), primer extension based methods (Orchid, Sequenom, others),cleavase based methods (Third Wave, Madison, Wis.), bead based methods(Luminex, Illumina) miniaturized electrophoresis methods (Kiva Genetics)or by DNA sequencing. The key requirement of any genotyping method isthat it be sufficiently sensitive to detect the amount of DNA remainingafter allele enrichment. If there is a small quantity of DNA afterallele enrichment (less than 1 nanogram) then it may be necessaryincrease the number of PCR cycles, or to perform a two stepamplification procedure in order to boost the sensitivity of thegenotyping procedure. For example the enriched allele can be subjectedto 40 cycles of PCR amplification with a first set of primers, and theproduct of that PCR can then be subjected to a second round of PCR withtwo new primers internal to the first set of primers.

[0182] No DNA amplification procedure is required in any step of theenrichment procedure until the genotyping step at the end, so alleleenrichment methods are not constrained by the limitations ofamplification procedures such as PCR. As a result, the length offragments that can be analyzed is, in principal, quite large. (Incontrast, amplification procedures—such as PCR—generally becometechnically difficult above 5-10 kb, and very difficult or impossibleabove 20 kb, particularly when the template is human genomic DNA orgenomic DNA of similar complexity.) It can also be difficult, duringamplification (e.g. when using methods such as polymerase chainreaction) to prevent the occurance of some degree of in vitro alleleinterchange. That is, during denature-renature cycles of the PCR, primerextension products that have not extended all the way to the reverseprimer (i.e. incompletely extended strands) may anneal to a differenttemplate strand than the one they originated from—in some cases atemplate corresponding to a different allele—resulting in synthesis ofan in vitro recombinant DNA product that does not correspond to anynaturally occuring allele. In contrast, there is no chance ofartifactual DNA strand interchange with the allele enrichment methodsdescribed in this application. Apart from these advantages overamplification methods, the strand selection methods described below arealso attractive in that the costs of optimizing and carrying out a longrange PCR amplification are avoided. Furthermore, the allele enrichmentprocedures described herein are for the most part generic: the samebasic steps can be followed for any DNA fragment.

[0183] 3.2 Double Stranded vs. Single Stranded Allele Selection Methods

[0184] Allele selection may be accomplished using single or doublestranded DNA. Single stranded DNA is produced by denaturing doublestranded DNA—for example by heating or by treatment with alkali,preferably after a sizing procedure has been applied to double strandedDNA to achieve an optimal size distribution of DNA fragments. Bothsingle and double stranded DNA methods have advantages anddisadvantages. One advantage of single stranded methods is that thespecificity of Watson-Crick base pairing can be exploited for theaffinity capture of one allele. Disadvantages of single strand methodsinclude: (i) the propensity of single stranded DNA molecules to annealto themselves (forming complex secondary structures) or to other, onlypartially complementary single stranded molecules. For example theubiquitous human DNA repeat element Alu (which is up to ˜280 nucleotideslong) may cause two non-complementary strands to anneal; (ii) Singlestranded DNA is more susceptible to breakage than double stranded DNA.Strand breaks destroy the physical contiguity that is essential forhaplotyping.

[0185] Double stranded DNA has several advantages over single strandedDNA as the starting point for the haplotyping methods of this invention.First, it is less susceptible to breakage. Second, it is less likely tobind non-specifically to itself or other DNA molecules (whether singlestranded or double stranded). Third, there are a variety of highaffinity, sequence specific interactions between double stranded DNA andproteins (e.g. restriction enzymes, transcription factors, natural andartificial zinc finger proteins), as well as high affinity interactionsbetween double stranded DNA and single stranded DNA or modifiedoligonucleotides (e.g. via Hoogsteen or reverse Hoogsteen base pairing)and between double stranded DNA and small molecules (e.g. polyamides)that can provide the basis for allele enrichment. Another type ofstructure that can be exploited for allele enrichment is D-loops, formedby strand invasion of a duplex DNA molecule by an oligonucleotide or aDNA-like molecule such as peptide nucleic acid (PNA). D loop formationcan be facilitated by addition of E. Coli RecA protein, using methodsknown in the art. Fourth, restriction enzyme cleaved double stranded DNAmay have termini that can provide the basis for allele specifictreatments, including affinity selection (e.g. ligation to an adapterstrand), strand degradation (e.g. allele selective degradation of oneallele but not the other), circularization and other proceduresdescribed below.

[0186] 3.3 Protein-Based Allele Enrichment Methods

[0187] In protein-based allele enrichment methods a DNA binding proteinwith differential affinity for the two allelic DNA segments to beseparated is added to a mixture of genomic DNA or cDNA fragments (whichhave been denatured in the case of single strand methods). Since mostsequence specific binding proteins recognize double stranded DNA, dsDNAis the preferred starting material for protein-based allele enrichmentmethods. Next complexes are formed between the binding protein and DNAsegments containing the sequence motif recognized by the bindingprotein. The DNA segments to be haplotyped must differ in respect to thepresence or absence of the protein binding site. As described above,this requirement can be met in several ways, most preferably by usingrestriction endonuclease(s) to produce DNA fragments that contain onlyone binding site (selected allele), or none (other allele), for thebinding protein. Subsequently the protein-DNA complexes are purifiedfrom the DNA that is not protein bound. This purification can beaccomplished by adding a second reagent that (i) binds to the bindingprotein, and (ii) can be physically separated from the mixture. Thesecond reagent may be a bead, a magnetic particle, a surface or anyother structure that facilitates the physical separation step. It may bemodified by coupling to a protein binding reagent. Specific reagents forbinding to proteins include antibodies, nickel-polyhistidine affinity,avidin-biotin affinity and similar schemes known to those skilled in theart. The physical separation of the complexes containing binding proteinbound to DNA can be effected by gravity, centrifugation, a magneticfield, filtration, chromatography, electrophoresis or other means. Thefinal step is to genotype the purified material, which yields thehaplotype.

[0188] 3.3.1 Protein-Based Double Stranded Allele Selection Methods

[0189] The major categories of naturally occuring sequence specific DNAbinding proteins include zinc finger proteins and helix-turn-helixtranscription factors. In addition, proteins that normally act on DNA asa substrate can be made to act as DNA binding proteins either by (i)alterations of the aqueous environment (e.g. removal of ions, substratesor cofactors essential for the enzymatic function of the protein, suchas divalent cations) or (ii) by mutagenesis of the protein to disruptcatalytic, but not binding, function. Classes of enzymes that bind tospecific dsDNA sequences include restriction endonucleases and DNAmethylases. (For a recent review see: Roberts R. J. and D. Macelis.REBASE—restriction enzymes and methylases. Nucleic Acids Res. Jan. 1,2000;28(1):306-7.) Finally, in vitro evolution methods (DNA shuffling,dirty PCR and related methods) can be used to create and select proteinsor peptides with novel DNA binding properties. The starting material forsuch methods can be the DNA sequence of a known DNA binding protein orproteins, which can be mutagenized globally or in specific segmentsknown to affect DNA binding, or can be otherwise permuted and thentested or selected for DNA binding properties. Alternatively thestarting material may be genes that encode enzymes for which DNA is asubstrate—e.g. restriction enzymes, DNA or RNA polymerases, DNA or RNAhelicases, topoisomerases, gyrases or other enzymes. Such experimentsmight be useful for producing sequence specific ssDNA binding proteins,as well as sequence specific dsDNA binding proteins. For recentdescriptions of in vitro evolution methods see: Minshull J. and W. P.Stemmer: Protein evolution by molecular breeding. Curr Opin Chem Biol.June 1999;3(3):284-90; Giver, L., and F. H. Arnold: Combinatorialprotein design by in vitro recombination. Curr Opin Chem Biol. June1998;2(3):335-8; Bogarad, L. D. and M. W. Deem: A hierarchical approachto protein molecular evolution. Proc Natl Acad Sci USA. Mar. 16,1999;96(6):2591-5; Gorse, D., Rees, A., Kaczorek, M. and R. Lahana:Molecular diversity and its analysis. Drug Discov Today. June1999;4(6):257-264.

[0190] Among the classes of DNA binding proteins enumerated above whichcould be used to select DNA molecules, a preferred class of proteinswould have the following properties: (i) any two sequences differing byone nucleotide (or by one nucleotide pair in the case of dsDNA) could bediscriminated, not limited by whether or not one version of the sequenceis a pallindrome, or by any other sequence constraint, (ii) DNA bindingproteins can be designed or selected using standard conditions, so thatthe design or selection of proteins for many different sequence pairs isnot onerous. (This requirement arises from the concern that, in order tobe able to readily select any given DNA molecule for haplotyping it isdesirable to have a large collection of DNA binding proteins, eachcapable of discriminating a different pair of sequences.) (iii) Theaffinity of the protein for the selected DNA sequence is sufficient towithstand the physical and/or chemical stresses introduced in the alleleenrichment procedure. (iv) The DNA binding molecules are stable enoughto remain in native conformation during the allele enrichment procedure,and can be stored for long periods of time. (v) The length of sequencebound by the allele specific DNA binding protein is preferably at leastsix nucleotides (or nucleotide pairs), more preferably at least 8nucleotides, and most preferably 9 nucleotides or longer. The longer therecognition sequence, the fewer molecules in the genomic DNA fragmentmixture will be bound, and therefore the less ‘background’ DNA therewill be accompanying the enriched allele. In addition to the fiveforegoing criteria, it may be desirable to make a fusion between the DNAbinding protein and a second protein so as to facilitate enrichment ofthe DNA binding protein. For example, appending an epitope containingprotein would allow selection by antibody based methods. Appending sixor more histidine residues would allow selection by zinc affinitymethods. (DNA binding proteins may also be useful in microscopy-basedhaplotyping methods described elsewhere in the application, and for thatpurpose it may be useful to make a fusion with a protein that produces adetectable signal—for example green fibrillary protein.)

[0191] 3.3.2 Zinc Finger Proteins

[0192] Given the above criteria, zinc finger proteins are a preferredclass of DNA binding proteins. It is well established that zinc fingerproteins can bind to virtually any DNA sequence motif; in particular,they are not limited to pallindromic sequences, as both type IIrestriction endonucleases and helix-turn-helix transcription factorsare. See, for example: Choo, Y. and A. Klug (1994) Proc. Natl. Acad.Sci. U.S.A. 91: 11163-11167. Jamieson, A. C., Wang, H. and S.-H. Kim.(1996) A Zinc finger directory for high-affinity DNA recognition. Proc.Natl. Acad. Sci. U.S.A. 93: 12834-12839. Segal, D. J., Dreier, B.,Beerli, R. R. and C. F. Barbas (1999) Toward controlling gene expressionat will: selection and design of zinc finger domains recognizing each ofthe 5′-GNN-3′ DNA target sequences. Proc. Natl. Acad. Sci. U.S.A. 96:2758-2763. Segal, D. J. and C. F. Barbas (2000) Design of novel sequencespecific DNA-binding proteins. Curr. Opin. Chem. Biol. 4: 34-39. Thesepapers and other work in the field demonstrate that it is possible togenerate zinc finger proteins that will bind virtually any DNA sequencefrom 3 nucleotides up to 18 nucleotides. Further, these studies showthat in vitro generated zinc finger proteins are capable of bindingspecific DNA sequences with low nanomolar or even subnanomolar affinity,and are capable of distinguishing sequences that differ by only one basepair with 10 to 100-fold or even greater differences in affinity. It hasalso been demonstrated that zinc finger proteins can be modified byfusion with other protein domains that provide detectable labels orattachment domains. For example zinc finger proteins can be fused withjellyfish green fibrillary protein (GFP) for labelling purposes, orfused to polyhistidine at the amino or carboxyl terminus, or fused withan antibody binding domain such as glutathione transferase (GST) orinfluenza virus hemagglutinin (HA) (for which there are commerciallyavailable antisera) for attachment and selection purposes.

[0193] Methods for making zinc finger proteins of desired sequencespecificity are well known in the art and have recently been adapted tolarge scale experiments. See, in addition to the above references:Beerli R. R., Dreier B. and C. F. Barbas (2000) Positive and negativeregulation of endogenous genes by designed transcription factors. ProcNatl Acad Sci USA. 97: 1495-1500; Beerli R. R., Segal D. J., Dreier B.and C. F. Barbas (1998) Toward controlling gene expression at will:specific regulation of the erbB-21HER-2 promoter by using polydactylzinc finger proteins constructed from modular building blocks. Proc NatlAcad Sci USA. 95: 14628-14633.) Methods for using phage display toselect zinc finger proteins with desired specificity from largelibraries have also been described: Rebar E. J. and C. O. Pabo (1994)Zinc finger phage: affinity selection of fingers with new DNA-bindingspecificities. Science. 263(5147):671-673. Rebar E. J., Greisman H. A.and C. O. Pabo (1996) Phage display methods for selecting zinc fingerproteins with novel DNA-binding specificities. Methods Enzymol.267:129-149.) The phage display method offers one way to bind selectedalleles to a large complex that can be efficiently removed from amixture of DNA fragments. Preventing nonspecific DNA binding to intactphage requires careful optimization of blocking conditions.

[0194] For the haplotyping methods described in this application thelength of the DNA sequence recognized by a zinc finger protein may rangefrom 3 nucleotides to 18 or more nucleotides. Sequences less than 6nucleotides are generally not useful for haplotyping because, unlessthey contain the dinucleotide CpG, they occur too frequently in DNA toallow haplotyping over distances of several kb or longer—that is, forDNA fragments greater than, say, 3 kb, both alleles will frequently haveone or more copies of any sequence motif that is shorter than sixnucleotides, because the average interval between three, four and fivenucleotide repeats (assuming random order sequence and equal occuranceof each of the four nucleotides, neither of which actually obtains) is4exp3=64, 4exp4=256 and 4exp5=1024. If both alleles have a copy of atarget sequence then clearly that sequence cannot be used to selectivelyenrich one allele. Preferred zinc finger proteins recognize 6, 9, 12 or18 nucleotides, with the longer sequences preferred. However, the lengthof sequence bound is only one of several important considerations inselecting an optimal zinc finger protein. Equally important are thespecificity of binding and the affinity of binding. Preferably a zincfinger protein has a specificity of at least 10 fold, and morepreferably 100 fold or greater, with respect to all sequences thatdiffer from the selected sequence by one or more nucleotides. This levelof specificity will enable the allele selectivity required forsuccessful allele enrichment. Optimal zinc finger proteins must alsohave a high affinity for the selected sequence. Preferably thedissociation constant of the zinc finger protein for the target DNAsequence is less than 50 nanomolar, more preferably less than 10nanomolar, and most preferably less than 2 nanomolar. Multiple zincfinger proteins that meet all the enumerated criteria have beenproduced, demonstrating the ability of the skilled artisan to accomplishthe necessary modifications to naturally occurring zinc finger proteins.Methods for improving the specificity and affinity of binding includerandom or site directed mutagenesis, selection of phage bearing mutantzinc finger proteins with desired specificity from large libraries ofphage, and in vitro evolution methods.

[0195] Because each zinc finger recognizes three nucleotides, one way tomake zinc finger proteins that recognize sequences of six nucleotides orlonger is to assemble two or more zinc fingers with known bindingproperties. The use of zinc fingers as modular building blocks has beendemonstrated by Barbas and colleagues (see: Proc Natl Acad Sci USA. 95:14628-14633, 1998) for nucleotide sequences of the form (GNN)_(x) whereG is guanine, N is any of the four nucleotides, and x indicates thenumber of times the GNN motif is repeated.

[0196] A large number of zinc finger proteins exist in nature, and astill larger number have been created in vitro (e.g. see work citedabove). Any of these known zinc finger proteins may constitute a usefulstarting point for the construction of a useful set of allele specificDNA binding proteins. The protein Zif268 is the most extensivelycharacterized zinc finger protein, and has the additional advantage thatthere is relatively little target site overlap between adjacent zincfingers, making it well suited to the modular construction of zincfinger proteins with desired DNA sequence binding specificity. See, forexample: Segal, D. J., et al. Proc Natl Acad Sci USA. 96: 2758-2763,1999. Zif268 is a preferred backbone for production of mutant zincfinger proteins.

[0197] 3.3.3 Restriction Endonucleases

[0198] Another class of sequence specific DNA binding proteins usefulfor allele enrichment is restriction endonucleases. There are over 400commercially available restriction endonucleases, and hundreds more thathave been discovered and characterized with respect to their bindingspecificity. (Roberts R. J. and D. Macelis. Nucleic Acids Res. Jan. 1,2000;28(1):306-7.) Collectively these enzymes recognize a substantialfraction of all 4, 5 and 6 nucleotide sequences (of which there are 256,1024 and 4096, respectively). For certain polymorphic nucleotides, theexquisite sequence specificity of these enzymes can be used toselectively bind one allelic DNA fragment that contains the cognaterecognition site, while not binding to the DNA fragment corresponding tothe other allele, which lacks the cognate site. Restrictionendonucleases do not have the flexibility that zinc finger proteins doin being able to selectively bind virtually any sequence up to 18nucleotides in length, but they do have the attraction of being highlyspecific, readily available, and for the most part inexpensive toproduce. The identification of polymorphic sites that lie withinrestriction enzyme binding sequences will become much simpler as thesequence of the human genome is completed, and the generation ofrestriction maps becomes primarily a computational, rather than anexperimental, activity.

[0199] In order for restriction endonucleases to be useful as DNAbinding proteins their DNA cleaving function must first be neutralizedor inactivated; otherwise DNA is cleaved and released. Inactivation canbe accomplished in two ways. First, one can add restrictionendonucleases to DNA, allow them to bind under conditions that do notpermit cleavage, and then remove the DNA-protein complex. The simplestway to prevent restriction enzyme cleavage is to withhold divalentcations from the buffer. Second, one can alter restriction endonucleasesso that they still bind DNA but can not cleave it. This can beaccomplished by altering the sequence of the gene encoding therestriction endonuclease, using methods known in the art, or it can beaccomplished by post-translational modification of the restrictionendonuclease, using chemically reactive small molecules.

[0200] The first approach—withholding essential cofactors, such asmagnesium or manganese—has the advantage that no modification ofrestriction enzymes or the genes that encode them is necessary. Instead,conditions are determined that permissive for binding but nonpermissivefor cleavage.

[0201] It may not be possible to identify such conditions for allrestriction enzymes; some enzymes appear to require divalent cations forspecific, high affinity recognition of cognate DNA. For such enzymes itmay be possible to produce mutant forms that do not require divalentcations for high affinity, specific binding to cognate DNA. For example,mutants of the restriction enzyme Mun I (which binds the sequenceCAATTG) have been produced that recognize and bind (but do not restrict)cognate DNA with high specificity and affinity in the absence ofmagnesium ion. In contrast, wild type Mun I does not exhibit sequencespecific DNA binding in the absence of magnesium ion. The amino acidchanges in the mutant Mun I enzymes (D83A, E98A) have been proposed tosimulate the effect of magnesium ion in conferring specificity. See, forexample: Lagunavicius, A. and V. Siksnys (1997) Site-directedmutagenesis of putative active site residues of Mun I restrictionendonuclease: replacement of catalytically essential carbolylateresidues triggers DNA binding specificity. Biochemistry 36: 11086-11092.

[0202] Structural modification of restriction enzymes to alter theircleaving properties but not their binding properties in the presence ofmagnesium ion has been also been demonstrated. For example, in studiesof the restriction enzyme Eco R I (which binds the sequence GAATTC) ithas been demonstrated that DNA sequence recognition and cleavingactivity can be dissociated. Studies have shown that mutant Eco RIenzymes with various amino acid substitutions at residues Met137 andIle197 bind cognate DNA (i.e. 5′-GAATTC-3′) with high specificity butcleave with reduced or unmeasurably low activity. See: Ivanenko, T.,Heitman, J. and A. Kiss (1998) Mutational analysis of the function ofMet137 and Ile197, two amino acids implicated in sequence specific DNArecognition by the Eco RI endonuclease. Biol. Chem. 379: 459-465. Otherwork has led to the identification of mutant Eco RI proteins that havesubstantially increased affinity for the cognate binding site, whilelacking cleavage activity. For example, the Eco RI mutant Gln111 bindsGAATTC with ˜1,000 fold higher affinity than wild type enzyme, but has˜10,000 lower rate constant for cleavage. (See: King, K., Benkovic, S.J. and P. Modrich [1989] Glu-111 is required for activation of the DNAcleavage center of EcoRI endonuclease J. Biol. Chem. 264: 11807-15.) EcoRI Gln111 has been used to image Eco RI sites in linearized 3.2-6.8 kbplasmids using atomic force microscopy, a method that exploits the highbinding affinity and negligible cleavage activity of the mutant protein.The Eco RI Gln111 protein is a preferred reagent for the methods of thisinvention, as a reagent for the selective enrichment of alleles thatcontain a GAATTC sequence (and consequent depletion of alleles that lacksuch a sequence). Exemplary conditions for selective binding of Eco RIGln111 to DNA fragments with cognate sequence may include ˜50-100 mMsodium chloride, 10-20 mM magnesium ion (e.g. MgCl₂) and pH 7.5 in trisor phosphate buffer. Preferably there is molar equivalence of Eco RIGln111 and cognate DNA binding sites in the sample (e.g. genomic DNA);more preferably there is a 5, 10, 20 or 50-fold molar excess of enzymeover DNA. Preferred methods for enrichment of the Eco RI bound allelefrom the non-bound allele include the synthesis of a fusion proteinbetween Eco RI Gln111 and a protein domain that includes an antibodybinding site for a commercially available enzyme. Influenzahemagglutinin, beta galactosidase or glutathione S transferase andpolyhistidine domains are available as commercial kits for proteinpurification.

[0203] There are several schemes for producing, from genomic DNA, twohomologous (allelic) fragments of a gene that differ in respect to thepresence or absence of a sequence such as an Eco RI site. Scheme 1: ifthe complete sequence of the region being haplotyped is known then thelocation and identity of all restriction sites, including the subset ofrestriction sites that include polymorphic nucleotides in theirrecognition sequence, can be determined trivially by computationalanalysis using commercially available software. Those restriction sitesthat overlap polymorphic nucleotides in the DNA segment of interest canbe assessed for suitability as allele enrichment sites. The optimalcharacteristics of an allele enrichment site include: (i) The siteoccurs once, or not at all (depending on the allele) in a DNA segment tobe haplotyped. This is crucial since the basis of the allele enrichmentis the attachment of a protein to the binding site in the allele to beenriched, and its absence in the other allele present in the genomic DNAsample being haplotyped. (ii) There is a pair of nonpolymorphicrestriction sites, different from the site being used for alleleenrichment, that flank the polymorphic site and span a DNA segmentdeemed useful for haplotype analysis.

[0204] The steps for allele enrichment then comprise: restrict genomicDNA with the selected enzyme(s) that flank the polymorphic site so as toproduce a DNA segment useful for haplotype analysis (as well as manyother genomic DNA fragments); add the DNA binding protein (i.e. thecleavage-inactive restriction enzyme) in a buffer that promotes specificbinding to the cognate site (and, if necessary, prevents the restrictionenzyme from cleaving its cognate site); selectively remove therestriction enzyme-complex from the genomic DNA by any of the physicalor affinity based methods described above—antibody, nickel—histidine,etc. Subsequently, suspend the enriched allele in aqueous buffer andgenotype two or more polymorphic sites to determine a haplotype. Scheme2 is similar but does not require a specific restriction step. Instead,one randomly fragments genomic DNA into segments that, on average, areapproximately the length of the segment to be haplotyped. Then add theDNA binding protein and proceed with the enrichment as above. Thedisadvantage of this scheme is that there may be DNA fragments thatinclude non-polymorphic copies of the cognate sequence for the DNAbinding protein. The presence of such fragments will limit the degree ofallele enrichment because they will co-purify with the targeted allele,and produce background signal in the subsequent analysis steps. Thisproblem can be addressed by reducing the average size of the fragmentsin the random fragmentation procedure.

[0205] Because of the requirement that the enriched allele fragment havezero or one copies of the sequence to be used for attachment of therestriction, optimal restriction enzymes for these haplotyping methodsrecognize sequences of 5 nucleotides or greater; preferably theyrecognize sequence of 6 nucleotides or greater; preferably the cognatesites of such enzymes contain one or more dinucleotides or othersequence motifs that are proportionately underrepresented in genomic DNAof the organism that is being haplotyped; preferably, for haplotypingmethods applied to mammalian genomic DNA, they contain one or more5′-CpG-3′ sequences, because CpG dinucleotides are substantiallydepleted in mammalian genomes. Restriction enzymes that include CpGdinucleotides include Taq I, Msp I, Hha I and others known in the art.

[0206] 3.3.4 Additional Restriction Endonuclease Based Methods

[0207] A limitation of the restriction enzyme based allele capturemethod is that the length of DNA fragment that can be haplotyped dependson the local restriction map. In some cases it may be difficult to finda polymorphic restriction site for which a cleavage-inactive restrictionenzyme is available and for which the nearest 5′ and 3′ flankingsequences are at an optimal distance for haplotyping; often the flankingrestriction enzyme cleavage sites will be closer to the polymorphic sitethan desired, limiting the length of DNA segment that can be haplotyped.For example, it may be optimal from a genetic point of view to haplotypea 15 kb segment of DNA, but there may be no polymorphic restrictionsites that are flanked by sites that allow isolation of the desired 15kb segment. One approach to this problem is to haplotype several smallDNA fragments that collectively span the 15 kb segment of interest. Acomposite haplotype can then be assembled by analysis of the overlapsbetween the small fragments.

[0208] A more general, and more useful, method for circumventing thelimitations occasionally imposed by difficult restriction maps is toincorporate aspects of the RecA assisted restriction endonuclease (RARE)method in the haplotyping procedure. (For a description of the RAREprocedure see: Ferrin, L. J. and R. D. Camerini-Otero [1991] Science254: 1494-1497; Koob, M. et al. [1992] Nucleic Acids Research 20:5831-5836.) When the RARE techniques are used in the protein mediatedallele enrichment method it is possible to haplotype DNA segments ofvirtually any length, regardless of the local restriction site map.

[0209] First, the DNA is sized, either by random fragmentation toproduce fragments in the right size range (e.g. approximately 15 kbaverage size), or one can use any restriction endonuclease or pair ofrestriction endonucleases to cleave genomic DNA (based on the knownrestriction map) so as to produce fragments spanning the segment to behaplotyped. In the RARE haplotyping procedure one then uses anoligonucleotide to form a D loop with the segment of DNA that containsthe polymorphic restriction site (the site that will ultimately be usedto capture the DNA segment to be haplotyped). (The other copy of theallele present in the analyte sample lacks the restriction enzymesequence as a consequence of the polymorphism.) Formation of the D loopcan be enhanced by addition of E. Coli RecA protein, which assemblesaround the single stranded DNA to form a nucleoprotein filament whichthen slides along double stranded DNA fragments until it reaches acomplementary strand. RecA protein, in a complex with a gamma-S analogof ATP and a 30-60 nucleotide long oligodeoxynucleotide complementary oridentical to the sequence-targeted site in which the protectedrestriction site is embedded, then mediates strand invasion by theoligodeoxynucleotide, forming the D loop.

[0210] Once this loop is formed the next step is to methylate all copiesof the polymorphic restriction site using a DNA methylase. Substantiallyall copies of the restriction site present in the genomic DNA mixtureare methylated. (One nucleotide, usually C, is methylated.) The onepolymorphic restriction site which participates in the D loop is notmethylated because the D loop is not recognized by the DNA methylase.Next the D loop is disassembled and the methylase inactivated orremoved. This leaves the targeted restriction site available forrestriction enzyme binding (on the one allele that contains therestriction site). Finally, the restriction-inactive but high affinitybinding protein (e.g. Eco RI Gln111) is added to the mixture of genomicDNA fragments. The only fragment that should have an available Eco RIsite is the fragment to be haplotyped. Any of several methods can beused to selectively remove that fragment: the cleavage-inactiverestriction enzyme can be fused to a protein that serves as a handle tofacilitate easy removal by nickel-histidine, antibody-antigen or otherprotein-protein interaction, as described in detail elsewhere in thisinvention. Alternatively, an antibody against the restriction enzyme canbe prepared and used to capture the restriction enzyme-allele fragmentcomplex to a bead or column to which the antibody is bound, or othermethods known in the art can be employed.

[0211] The advantage of the RARE assisted haplotyping method is that thelocal restriction map, and in particular the occurance of other Eco RIsites (in this example) nearby, is no longer a limitation. Further, themethylation of all sites save the polymorphic site eliminates thepreference for restriction enzymes that recognize 6 or more nucleotides.With the RARE haplotyping technique any enzyme, including one thatrecognizes a four nucleotide sequence, is effective for alleleenrichment. This is a particularly useful aspect of the inventionbecause four nucleotide sequences recognized by restriction enzymes moreoften encompass polymorphic sites than 5 or 6 nucleotide sequences, andthere are more DNA methylases for 4 nucleotide sequences than for 6nucleotide sequences recognized by restriction enzymes. Preferredrestriction sites for RARE assisted haplotyping are those for which DNAmethylases are commercially available, including, without limitation,Alu I, Bam HI, Hae III, Hpa II, Taq I, Msp I, Hha I, Mbo I and Eco RImethylases.

[0212] The use of peptides for allele enrichment is described below inthe discussion of small molecules that can be used for alleleenrichment.

[0213] 3.4 Haplotype Determination by Nucleic Acid-Based AlleleSelection Methods

[0214] In another aspect of the invention, nucleic acids and nucleicacid analogs that bind specifically to double stranded DNA can betargeted to polymorphic sites and used as the basis for physicalseparation of alleles. Ligands attached to the targetingoligonucleotides, such as, without limitation, biotin, fluorescein,polyhistidine or magnetic beads, can provide the basis for subsequentenrichment of bound alleles. Sequence specific methods for the captureof double stranded DNA, useful for the haplotyping methods of thisinvention, include: (i) Triple helical interactions between singlestranded DNA (e.g. oligonucleotides) and double stranded DNA viaHoogsteen or reverse Hoogsteen base pairing. (ii) D-loop formation,again between a single stranded DNA and a double stranded DNA. (iii)D-loop formation between peptide nucleic acid (PNA) and a doublestranded DNA. (iv) in vitro nucleic acid evolution methods (referred toas SELEX) can be used to derive natural or modified nucleic acids(aptamers) that bind double stranded DNA in a sequence specific mannervia any combination of Watson-Crick or Hoogsteen base pairing, hydrogenbonds, van der Waals forces or other interaction.

[0215] The D loop is formed by the displacement of one strand of thedouble helix by the invading single strand. RecA protein, as indicatedabove, facilitates D Loop formation, albeit with only limited stringencyfor the extent of homology between the invading and invaded sequences.nucleotide analogs Such interactions are useful for allele enrichmentwhen a polymorphic site lies in a sequence context that conforms to therequirements for Hoogsteen or reverse Hoogsteen base pairing. Thesequence requirements generally include a homopyrimidine/homopurinesequence in the double stranded DNA, however the discovery of nucleotideanalogs that base pair with pyrimidines in triplex structures hasincreased the repertoire of sequences which can participate in triplestranded complexes. Nonetheless, more general scheme for selectivebinding to polymorphic DNA sequences is preferable.

[0216] 3(b). Haplotype Determination by Nucleic Acid-Based DoubleStranded Allele Selection Methods

[0217] In another aspect of the invention, nucleic acids that bindspecifically to double stranded DNA can be targeted to polymorphic sitesand used as the basis for physical separation of alleles. The best knowntypes of specific interactions involve triple helical interactionsformed via Hoogsteen or reverse Hoogsteen base pairing. Theseinteractions are useful for haplotyping when a polymorphic site lieswithin a sequence context that conforms to the reqirements for Hoogsteenor reverse Hoogsteen base pairing. These requirements typically includea homopyrimidine/homopurine sequence, however the discovery of nucleicacid modifications that permit novel base pairings is resulting in anexpanded repertoire of sequences. Nonetheless, a more general scheme forselective binding to polymorphic DNA sequences is preferable.

[0218] In another aspect of the invention the formation of D loops bystrand invasion of dsDNA can be the basis for an allele specificinteraction, and secondarily for an allele enrichment scheme. Peptidenucleic acid (PNA) is a preferred material for strand invasion. Due toits high affinity DNA binding PNA has been shown capable of higheffiency strand invasion of duplex DNA. (Peffer N J, Hanvey J C, Bisi JE, et al. Strand-invasion of duplex DNA by peptide nucleic acidoligomers. Proc Natl Acad Sci USA. Nov. 15, 1993;90(22): 10648-52;Kurakin A, Larsen H J, Nielsen P E. Cooperative strand displacement bypeptide nucleic acid (PNA). Chem Biol. February 1998;5(2):81-9. Thebasis of a PNA strand invasion affinity selection would be conceptuallysimilar to protein-based methods, except the sequence-specific DNA-PNAcomplexes formed by strand invasion are the basis of an enrichmentprocedure that exploits an affinity tag attached to the PNA. Theaffinity tags may be a binding site for an antibody such as fluoresceinor rhodamine, or polyhistidine (to be selected by nickel affinitychromatography), or biotin, (to be selected using avidin- orstreptavidin-coated beads or surface) or other affinity selectionschemes known to those skilled in the art.

[0219] In another embodiment of the invention, in vitro nucleic acidevolution methods (referred to as aptamers or SELEX) can be used toderive natural or modified nucleic acids that bind double stranded DNAin a sequence specific manner. Methods for high throughput derivation ofnucleic acids capable of binding virtually any target molecule have beendescribed. (Drolet D W, Jenison R D, Smith et al. A high throughputplatform for systematic evolution of ligands by exponential enrichment(SELEX). Comb Chem High Throughput Screen. October 1999;2(5):271-8.)

[0220] 3(c). Other Double Stranded Allele Selection Methods

[0221] In another aspect of the invention, non-protein, non-nucleic acidmolecules can be the basis for affinity selection of double strandedDNA. (Mapp A K, Ansari A Z, Ptashne M, et al. Activation of geneexpression by small molecule transcription factors. Proc Natl Acad SciUSA. April 11, 2000;97(8):3930-5; Dervan P B, Burli R W.Sequence-specific DNA recognition by polyamides. Curr Opin Chem Biol.December 1999;3(6):688-93; White S, Szewczyk J W, Turner J M, et al.Recognition of the four Watson-Crick base pairs in the DNA minor grooveby synthetic ligands. Nature. Jan. 29, 1998;391(6666):468-71.)

[0222] 4. Haplotype Determination by Allele Specific Capture of SingleStranded DNA or RNA

[0223] In this example, modified oligonucleotides or modified nucleotidetriphosphates are used as affinity reagents to partially purify acomplementary DNA species (the allele to be haplotyped) with which theyhave formed a duplex. The nucleotide or oligonucleotide modification mayconstitute, for example, addition of a compound that binds with highaffinity to a known partner—such as biotin/avidin orpolyhistidine/nickel—or it may consist of covalent addition of acompound for which high affinity antibodies are available—such asrhodamine or fluorescein—or it may consist of addition of a metal thatallows physical separation using a magnetic field, or it may involveaddition of a reactive chemical group that, upon addition of a specificreagent or physical energy (e.g. uv light) will form a covalent bondwith a second compound that in turn is linked to a molecule or structurethat enables physical separation.

[0224] An example of such a modification would be biotin. DNA or RNAonce hybridized to biotinylated oligonucleotides or nucleotides could beseparated from non-hybridized DNA or RNA using streptavidin on a solidsupport. Other possible modifications could include but are not limitedto: antigens and antibodies, peptides, nucleic acids, and proteins thatwhen attached to oligonucleotides or nucleotides would bind to someother molecule on a solid support. For the purposes of this disclosure,biotin will be used as the exemplary modification of theoligonucleotides or nucleotides and streptavidin will be attached to thesolid support.

[0225] Oligonucleotides used in the descriptions below can be comprisedof either normal nucleotides and/or linkages or modified nucleotidesand/or linkages. The only requirement is that the oligonucleotidesretain the ability to hybridize DNA or RNA and that they can be utilizedby the appropriate enzymes if necessary. Examples of modifiedoligonucleotides could include but are not limited to: peptide nucleicacid molecules, phosphorothioate and methylphosphonate modifications.The term oligonucleotide when used below will refer to both natural andmodified oligonucleotides.

[0226] The following are examples for employing allele specific captureof DNA or RNA to determine haplotypes:

[0227] 1. A biotinylated oligonucleotide directed against a site that isheterozygous for a nucleotide variance, is allowed to hybridize to thetarget DNA or RNA under conditions that will result in binding of theoligonucleotide to only one of the two alleles present in the sample.The length, the position of mismatch between the oligonucleotide and thetarget sequence, and the chemical make-up of the oligonucleotide couldall be adjusted to maximize the allele specific discrimination.Streptavidin on a solid support is used to remove the biotinylatedoligonucleotide and any DNA or RNA associated by hybridization to theoligonucleotide. For example, allele 1 is specifically captured byhybridization of an oligonucleotide containing a T at the variance site.The target DNA or RNA from allele 1 is then disassociated from theprimer and solid support under denaturing conditions. The isolated RNAor DNA from allele 1 can then be genotyped to determine the haplotype.Alternatively, the RNA or DNA remaining in the sample, allele 2,following capture and removal of allele 1 could be genotyped todetermine the its haplotype.

[0228] 2. The target DNA is incubated with two oligonucleotides, one ofwhich is biotinylated (FIG. 15). If RNA is to be used in this example itmust first be converted to cDNA. The oligonucleotides are designed tohybridize adjacent to one another at the site of variance. For example,the 3′ end of the biotinylated oligonucleotide hybridizes one base 5′ ofthe variant base. The other oligonucleotide hybridizes adjacent to thebiotinylated primer with the 5′ most oligonucleotide hybridizing to thevariant base. If there is a perfect match at the site of variance(allele 1), the two primers are ligated together. However, if there is amismatch at the site of variance (allele 2) no ligation occurs. Thesample is then allowed to bind to the streptavidin on the solid supportunder conditions which are permissive for the hybridization of theligated oligonucleotides but non-permissive for the hybridization of theshorter non-ligated oligonucleotides. The captured oligonucleotides andhybridized target DNA are removed from the sample, the target DNA elutedfrom the solid support, and genotyped to determine haplotype.Alternatively, the allele 2 can be genotyped to determine haplotypeafter removal of allele 1 from the sample.

[0229] The size of the oligonucleotides can be varied in order toincrease the likelihood that hybridization and ligation will only occurwhen the correct allele is present. The ligation can be done underconditions which will only allow the hybridization of a shorteroligonucleotide if it is hybridized next to the perfectly matchedoligonucleotide and can make use of the stacking energy forstabilization. Also, either the biotinylated oligonucleotide or theother oligonucleotide can contain the mismatch. The biotin can also beput on the 5′ or 3′ end of the oligonucleotide as long as it is not atthe site of ligation.

[0230] 3. An oligonucleotide is hybridized to the target DNA in whichthe 3′ end of the oligonucleotide is just 5′ of the variant base. If RNAis to be used in this example it is first converted to cDNA. The sampleis then incubated in the presence of four dideoxy nucleotides with apolymerase capable of extending the primer by incorporating dideoxynucleotides where one of the dideoxy nucleotides contains a biotin. Thebiotinylated dideoxy nucleotide is selected to correspond to one of thevariant bases such that it will be incorporated only if the correct baseis at the site of variance. For example, the base chosen is biotin ddTTPwhich will be incorporated only when the primer anneals to allele 1. Theprimer with the incorporated biotinylated dideoxy nucleotide hybridizedto allele 1 is separated from the rest of the DNA in the sample usingstreptavidin on a solid support. The isolated allele 1 can then beeluted from the solid support and genotyped to determine haplotype. Asabove, allele 2 which is left in the sample after capture and removal ofallele 1, can also be genotyped to determine haplotype.

[0231] The dideoxy and biotinylated nucleotide do not have to be thesame nucleotide. The primer could be extended in the presence of onebiotinylated nucleotide, one dideoxy nucleotide and two normalnucleotides. For example, a biotinylated dTTP and a normal dGTP would beadded in with another normal nucleotide (not dTTP or dGTP) and a dideoxynucleotide (not ddTTP or ddGTP). The dideoxy nucleotide would be chosenso that the extension reaction would be terminated before the occurrenceof another site for the incorporation of the biotinylated dTTP.Extension from the primer on allele 1 would result in the incorporationof a biotinylated dTTP. Extension from the primer on allele 2 wouldresult in the incorporation of a normal dGTP. Streptavidin on a solidsupport could be used to separate allele 1 from allele 2 for genotypingto determine haplotype.

[0232] 4(a). Protein-Based Single Stranded Allele Selection Methods

[0233] 4(c). Other Single Stranded Allele Selection Methods

[0234] 5. Post-Allele Selection Genotyping Methods

[0235] 6. Allele Selective Amplification Haplotyping Methods

[0236] Described in this application is an alternative method forobtaining allele specific PCR products. Our method also entails usingmodified primers, however the basis for achieving allele specificamplification is the formation of a duplex or secondary structureinvolving base pairing between (i) nucleotides at or near the 3′ end ofa strand (said nulceotides being at least partially templated by aprimer for the complementary strand) and (ii) nucleotides of the samestrand that lie further interior from the 3′ end and include (crucially)a polymorphic site (or sites), such that: (i) the secondary structure isformed to a different extent in the two alleles (ideally the secondarystructure is formed in a completely allele specific manner), and (ii)the secondary structure at least partially inhibits primer bindingand/or primer extension, and consequently inhibits amplification of thestrand with the secondary structure at the 3′ end. The point of theprimer modification, then, is to produce a template for polymerizationon the complementary strand leading to a sequence that will form asecondary structure that will inhibit further primer binding/extensionfrom that end. The modification in the primer can be introduced eitherat the 5′ end or internally, but not at the 3′ end of the primer. Anexample of this method applied to haplotyping the ApoE gene is providedbelow (Example 3), along with FIGS. 14-17 that illustrate some of thetypes of secondary structure that can be produced to inhibit primerbinding/extension.

[0237] One implementation of the method entails introducing a 5′extension in a primer. After a complementary strand is extended acrossthat primer, and then separated by a cycle of denaturation, thecomplementary strand forms a hairpin loop structure in one allele butnot the other. Specifically, the free 3′ end of the complementary strandanneals to an upstream segment of the same strand that includes thepolymorphic site, such that the polymorphic site participates in thestem of the loop (see FIGS. 14, 15). If the polymorphic nucleotide iscomplementary to the nucleotide near the 3′ end of the strand a tightstem will be formed. If not, then a lower affinity interaction willexist and, at appropriately selected conditions, the stem will not form.Since the formation of the stem makes the 3′ end of the strand no longeravailable for binding free primer, the amplification of the strand inwhich a perfect stem is formed is inhibited, as shown in Example 1. Thelength of the 5′ extension on the primer can be varied, depending on thedesired size of the loop, or on whether it is desirable to formalternative structures or enzyme recognition sites.

[0238] Some of the alternative structures that would be useful include:(i) recognition sites for various DNA modifying enzymes such asrestriction endonucleases, (ii) a cruciform DNA structure, that could bevery stable, or could be recognized by enzymes such as bacteriophageresolvases (e.g., T4E7, T7E1), or (iii) recognition sites for DNAbinding proteins (preferably from thermophilic organisms) such as zincfinger proteins, catalytically inactive endonucleases, or transcriptionfactors. The purpose of inducing such structures in an allele specificmanner would be to effect allele specific binding to, or modificationof, DNA. For example, consider a duplex formed only (or preferentially)by a strand from one allele that contains the recognition sequence for athermostable restriction enzyme such as Taq I. Allele specific strandcleavage could be achieved by inclusion of (thermostable) Taq I duringthe PCR, resulting in complete inactivation of each cleaved templatemolecule and thereby leading to allele selective amplification. What arethe limits of such an approach? One requirement is that there are no TaqI sites elsewhere in the PCR amplicon; another is that one of the twoalleles must form a Taq I recognition sequence. The formerlimitation—which would limit the length of amplicons that could beallele specifically modified if only frequently occurring restrictionsites were used—as well as the latter limitation—which requires thatpolymorphisms occur in restriction endonuclease recognition sites—can beaddressed in part by designing a 5′ primer extension, along with aninternal primer loop, so that the recognition sequence for a rarecutting restriction endonuclease that (i) is an interrupted pallindrome,or (ii) cleaves at some distance from its recognition sequence is formedby the internal loop, while (i) the other end of the interruptedpallindrome, or (ii) the cleavage site for the restriction enzyme,occurs at the polymorphic nucleotide, and is therefore sensitive towhether there is a duplex or a (partially or completely) single strandedregion at the polymorphic site. This scheme is illustrated in FIG. 20.Preferred enzymes for PCR implementation of these schemes would includeenzymes from thermophiles, such as Bsl I (CCNNNNN/NNGG) and Mwo I(GCNNNNN/NNGC).

[0239] Other alternative schemes would entail placing the stem-formingnucleotides internally, rather than at the end of the primer.

[0240] The experiments described above and in Example 1 are directed tostem formation during PCR, which requires that the stem be stable at anannealing temperature of −50° C. or greater. However, isothermalamplification methods, such as 3SR and others, can also be used toachieve allele specific amplification. For isothermal amplificationmethods the loop forming sequences would likely be designed differently,to achieve maximum allele discrimination in secondary structureformation at 37° C., 42° C. or other temperatures suited toamplification. This can be achieved by shortening the length of duplexregions. Example 1 gives typical lengths of duplex regions for PCR-basedmethods. Shorter duplex lengths can be tested empirically for isothermalamplification methods.

[0241] Our method does not address the intrinsic limitations of allPCR-based methods, but does provide superior performance compared toother procedures in the literature, excellent allele specificity can beachieved at fragment lengths of up to 4 kb.

[0242] 7. Restriction Endonuclease-Based Haplotyping Methods

[0243] The first type of polymorphisms used to produce high densityhuman genetic maps were restriction fragment length polymorphisms(RFLPs). RFLPs are polymorphisms, usually but not necessarily SNPs, thataffect restriction endonuclease recognition sites. Initially RFLPs wereidentified, and subsequently typed, using Southern blots of genomic DNA.An RFLP was detected when the pattern of hybridizing species in aSouthern blot (hybridized with a single copy probe) varied from sampleto sample (i.e. from lane to lane of the Southern blot). Generally onedetectable fragment would be identified in some lanes, one or twosmaller fragments in other lanes, and both the large and smallerfragments in still other lanes, corresponding to homozygotes for theallele lacking the restriction site, homozygotes for the allelecontaining the restriction site and heterozygotes for the two alleles.The size difference between the restriction fragments lacking thepolymorphic restriction site and those with the restriction site dependson the distance from the polymorphic restriction site to flanking,non-polymorphic sites for the same restriction enzyme.

[0244] In the past the location of polymorphic restriction sites and thesizes of the restriction products have generally been determinedempirically. Although many restriction site polymorphisms have beenconverted to PCR assays by designing oligonucleotide primers flankingthe polymorphic site these assays lack the character of the initial RFLPassays in which the restriction enzyme did all the work, and the size ofthe restriction fragments varied over a wide range.

[0245] In one aspect of this invention, RFLPs can be used to producelong range haplotypes, over distances of at least 5 kb, frequently over10 kb and in some instances, using rarely occuring restriction sites,distances of up to 100 kb or greater. The basic approach is to: (i)select a DNA segment to be haplotyped (the exact boundaries will beconstrained by the next step); (ii) identify a polymorphism, eitherwithin the segment, or, preferably, in flanking DNA, that alters arestriction enzyme recognition site for a restriction endonuclease(RE1). The outer bounds of the segment to be haplotyped are defined bythe nearest occurance of RE1 on either side of the polymorphic site.;(iii) Prepare genomic DNA from samples that are heterozygous for thepolymorphism identified in step ii. It is desirable that the averagelength of the genomic DNA be greater than the length of the DNA fragmentbeing haplotyped. (iv) Restrict the genomic DNA with the enzyme thatrecognizes the selected polymorphic site; (v) separate the restrictedDNA using any DNA size fractionating method suitable to the size rangeof the restriction fragments of interest (including gel electrophoresis;centrifugation through a gradient of salt, sucrose or other material,including use of step gradients; chromatography using sephadex or othermaterial or other methods known in the art); (vi) isolate a first DNAfraction containing the larger restriction fragment and, optionally, asecond DNA fraction containing the smaller restriction fragment and, ifnecessary, purify DNA from each fraction for PCR. It is not necessarythat the fragments be highly enriched in the fractions, only that eachof the one or more DNA fractions contain a significantly greaterquantity of one allele than of the other. A minimum differential alleleenrichment that would be useful for haplotyping is 2:1, more preferablyat least 5:1 and most preferably 10:1 or greater. (vii) Genotype thepolymorphic sites of interest in either one of the fractions (the oneenriched for the larger allele or the one enriched for the smallerallele), or, optionally, determine genotypes separately in both sizefractions. Since each fraction contains principally one allele, thegenotype of the fractions provides the haplotypes of the enrichedalleles. If only one fraction is genotyped, providing one haplotype,then the other haplotype can be inferred by subtracting the determinedhaplotype from the genotype of the total genomic DNA of the samples ofinterest. In a haplotyping project it is desirable to determine thegenotypes in total genomic DNA of all samples of interest in advance ofthe haplotyping project, in order to determine, first, which samplesactually require haplotype analysis (because they contain two or moresites of heterozygosity in the segment of interest), second, whichsamples are heterozygotes at the restriction site polymorphism selectedfor separation of the alleles by size, and are therefore suitable foranalysis by the above method; third, the genotype of the total sampleconstrains the possible haplotypes, and provides a check on the accuracyof the haplotypes. Preferably the haplotype of both alleles aredetermined separately and compared to the genotype of the unfractionatedsample. Samples that are not suitable for haplotype analysis with onerestriction enzyme (because they are not heterozygous at the restrictionsite) can be analyzed with a different restriction enzyme, using thesteps described above.

[0246] In another aspect, two restriction enzymes plus an exonucleasecan be used in a haplotyping scheme that does not require a sizeseparation step. In this method, illustrated in FIGS. 18, 19, 20, theinitial steps are as above: (i) select a DNA segment to be haplotyped(the exact boundaries will be constrained by the next two steps); (ii)identify a polymorphism, either within the segment, or, preferably, inflanking DNA, that alters a restriction enzyme recognition site for arestriction endonuclease (RE1), the outer bounds of the segment to behaplotyped are defined by the nearest occurance of RE1 on either side ofthe polymorphic site; (iii) identify a second restriction endonuclease(RE2) which cleaves only once within the segment to be haplotyped; (iv)prepare genomic DNA from samples that are heterozygous for thepolymorphism identified in step ii, it is desirable that the averagelength of the genomic DNA be greater than the length of the DNA fragmentbeing haplotyped; (v) restrict the genomnic DNA with RE1; (vi) block theends of all cleavage products from exonuclease digestion (either byselecting an RE1 that produces termini not susceptible to exonucleasedigestion—for example 3′ protruding termini are resistant to cleavage byE. coli Exonuclease III; or by filling in recessed termini withnuclease-resistant modified nucleotides such as dideoxynucleotides, 5′amino-deoxynucleotide analogs, 2′-O-methyl nucleotide analogs,2′-methoxy-ethoxy nucleotide analogs, 4-hydroxy-N-acetylprolinolnucleotide analogues or other chemically modified nucleotides (asdescribed in PCT application, PCT US 99/22988, entitled METHODS FORANALYZING POLYNUCLEOTIDES); or by ligating adapters with nucleaseresistant changes to the restriction termini); (vii) restrict with RE2.At this point, the two alleles in the DNA region of interest are in adifferent state. Allele A was cleaved in two by RE1 at the polymorphicsite, both fragments were blocked from endonuclease digestion, and thenRE2 cleaved one of the two fragments in two pieces, both of which haveone end unprotected from exonuclease (a requirement of RE2 is that itproduce termini that are susceptible to exonuclease digestion).Crucially, the fragment not cleaved by RE2, is still protected at bothtermini. Conversely, Allele B, lacking an RE1 site at the polymorphicsite, was in one piece after RE1 digestion. RE2 digestion cleaved thatone piece in two, both of which are susceptible to nuclease digestion,the consequence of which is the exonuclease digestion of both halves ofthe fragment (from the unprotected ends). Thus nuclease acts on theentire segment to be haplotyped in Allele B. (viii) After nucleasedigestion, or at the same time, a small amount of a single strandspecific nuclease may be added in order to destroy any single strandedregions left after the exonuclease treatment. This is important only ifthe first nuclease has no single strand nuclease activity (as is thecase, for example, with E. coli Exonuclease III). Nuclease(s) can beinactivated, for example by heating, if necessary. (ix) A genotypingprocedure can be used to determine the status of all polymorphic sitesin the segment of Allele A that did not contain the site for RE2, andthus remained blocked at both ends during the exonuclease treatment.Since there is no (or little) Allele B remaining in the test tube, onlythe nucleotides corresponding to Allele A will be registered by thegenotyping procedure, and they constitute the haplotype. A variety ofnucleases can be used for this method, as well as combinations ofnucleases, with, for example, one converting fragments with unprotectedends into single stranded DNA molecules and the other digesting singlestranded DNA exo- or endonucleolytically. Specific nucleases useful forthis method include E. coli Exonucleases I and III, Nuclease Bal-31(which must be used with a suitable end protection procedure at stepvi), as well as the single strand specific Mung Bean Nuclease, humancytosolic 3′-to-5′ exonuclease and many other prokaryotic and eukaryoticexonucleases with processivity. Since large segments are more attractiveas haplotyping targets than short ones the processivity of the nucleasemay be a limit the utility of the method. Therefore highly processivenucleases are preferred. Such nucleases may be either natural ormodified by mutagenesis. As with other haplotyping methods, a minimumdifferential allele enrichment that would be useful is 2:1, morepreferably at least 5:1 and most preferably 10:1 or greater. It is alsopreferable to haplotype the polymorphic sites of interest on bothalleles in separate reactions. Alternatively, if the haplotype of onlyone allele is determined directly, then the other haplotype can beinferred by subtracting the known haplotype from the genotype of thetotal genomic DNA of the samples of interest. Haplotypes can be extendedover long regions by the combined use of several restriction fragmentlength polymorphisms suitable for the method as outlined above.

[0247] In the future, with a complete sequence of many genomes,including the human genome, available, and hundreds of thousands, if notmillions, of polymorphic sites identified it will be possible to designRFLP-based assays for the methods described above in silico. That is,one will be able to identify, for any DNA segment of interest, theflanking restriction sites for any available restriction enzyme, and thesubset of those sites that are polymorphic in the human (or other)population. Using criteria such as desired fragment location, desiredfragment length, desired difference in length between two alleles (forseparation by size) or location of a suitable site for R2 (forexonuclease removal of one allele) (for allele enrichement by selectiveexonuclease digestion), it will be possible to automate the design ofRFLP assays. In another aspect of this invention a program forautomatically designing experimental conditions, including restrictionendonucleases and either electrophoretic (or other) separationconditions, or exonucleases, given the constraints just described can beexecuted.

[0248] In another aspect, a polymorphic restriction endonuclease sitecan be exploited for haplotyping in conjuction with an amplificationstep, wherein first a genomic DNA sample is treated with a restrictionendonuclease that cleaves at a polymorphic site, and second anamplification is performed spanning the restriction cleavage site. Ifone of the two alleles present in a DNA sample contains the recognitionsite for the enzyme then it will not be amplified due to strand scissionof all template molecules. The other allele will be amplified, and theamplification product can subsequently be diluted and subjected to agenotyping procedure. The set of genotypes obtained constitute thehaplotype of the allele lacking the polymorphic RE1 site. In thismethod, illustrated in FIGS. 25 and 26, the initial steps are as above:(i) select a DNA segment to be haplotyped (the exact boundaries will beconstrained by the next step); (ii) identify a polymorphism, eitherwithin the segment, or, preferably, in flanking DNA, that alters arestriction enzyme recognition site for a restriction endonuclease(RE1). The outer bounds of the segment to be haplotyped are defined bythe nearest occurance of RE1 on either side of the polymorphic site.(iii) Prepare genomic DNA from samples that are heterozygous for thepolymorphism identified in step ii. It is desirable that the averagelength of the genomic DNA be greater than the length of the DNA fragmentbeing haplotyped. (iv) Restrict the genomic DNA with RE1; (v) perform anamplification, for example a PCR amplification, using forward andreverse primers located on opposite sides of the polymorphic RE1 site,but within the DNA segment subtended by the flanking, non-polymorphic,RE1 sites. (vi) dilute the amplified DNA (optional; useful mainly if thegenotyping procedure of the next step is amplification-based); (vii)subject the amplified DNA to genotyping tests for one or morepolymorphisms that lie within the amplified segment. Virtually anygenotyping method will work. One preferred genotyping method (notrequiring the dilution of step vi) is primer extension, followed byelectrophoretic or mass spectrometric analysis. Primers are positionedjust upstream of one or more polymorphic sites in the amplified segment,extended in an allele specific manner and analyzed using methods knownin the art. This method can also be used in conjunction with allelespecific priming experiments of this invention, in order to boostspecificity of allele amplification.

[0249] Restriction endonuclease sites that flank the target segment canbe exploited to produce optimally sized molecules for allele selection.For example, a heterozygous DNA sample can be restricted so as toproduce two allelic DNA fragments that differ in length, andconsequently differ from one another by the presence or absence of abinding site for an allele specific binding reagent. Because of the easeof restriction endonuclease digestion, and the possibility of cleavingjust outside the target DNA segment to be haplotyped (thereby producingthe maximal size DNA fragment that differs in respect to thepresence/absence of a single binding site), complete restriction is apreferred method for controlling the size of DNA segments prior toallele enrichment.

[0250] 8. Imaging-Based Haplotyping Methods

[0251] (a) Optical Mapping Technology

[0252] One way to determine haplotypes would be to use microscopy todirectly visualize a double stranded DNA molecule in which the sequenceof the DNA at polymorphic sites is revealed visually. David Schwartz andcolleagues have developed a family of methods for the analysis of largeDNA fragments on modified glass surfaces, which they refer to as opticalmapping. Specifically, Schwartz and colleagues have devised methods forpreparing large DNA fragments, fixing them to modified glass surfaces inan elongated state while preserving their accessibility to enzymes,visualizing them microscopically after staining, and collecting andprocessing images of the DNA molecules to produce DNA restriction mapsof large molecules. (Lai Z, Jing J, Aston C, et al. A shotgun opticalmap of the entire Plasmodium falciparum genome. Nat Genet. November1999;23(3):309-13; Aston C, Mishra B, Schwartz DC. Optical mapping andits potential for large-scale sequencing projects. Trends Biotechnol.July 1999;17(7):297-302; Aston C, Hiort C, Schwartz D C. Opticalmapping: an approach for fine mapping. Methods Enzymol. 1999;303:55-73;Jing J, Reed J, Huang J, et al. Automated high resolution opticalmapping using arrayed, fluid-fixed DNA molecules. Proc Natl Acad SciUSA. Jul. 7, 1998;95(14):8046-51.) Many of the imaging and imageanalysis steps have been automated. (see articles cited above and:Anantharaman T, Mishra B, Schwartz D. Genomics via optical mapping. III:Contiging genomic DNA. Ismb. 1999;(6):18-27.) Many of the opticalmapping methods have also been described in U.S. Pat. No. 5,720,928.

[0253] The optical mapping methods of Schwartz and colleagues have sofar been largely confined to the generation of restriction endonucleasemaps of large DNA segments or even genomes by treating immobilized,surface bound double stranded DNA molecules with restrictionendonucleases, and to a lesser extent, studies of DNA polymerase onsingle DNA molecules. For example, a complete BamH I and Nhe Irestriction map of the genome of Plasmodium Falciparum has been madeusing optical mapping. The average fragment length of analyzed fragmentswas 588-666 kb, and the average coverage of the map was 23 X for Nhe Iand 31 X for BamH I. (That is, on average, each nucleotide of the genomewas present in 23 or 31 different analyzed fragments. This high level ofredundancy provides higher map accuracy.) P. falciparum has a genomelength of ˜24.6 megabases, so, taking into account the 31 X redundancyof the BamH I map, ˜763 mb were analyzed. The human genome, at ˜3,300mb, is only about 4 times larger than the scale of this experiment(albeit at 1 X coverage, which would be insufficient for highly accurateresults). However, it should be possible, using a higher density of DNAfragments, and/or a larger surface, to prepare glass slides withfragments corresponding to several equivalents of the human genome.Statistically reliable haplotyping results would be obtainable from suchDNA preparations, using the methods described below. As an alternativeto whole genome preparations, size selected fractions of the genome, orlong range amplification products could also be used for the haplotypingmethods described below.

[0254] It is an aspect of the present invention that optical mapping andrelated methods can be exploited for determination of haplotypes.Several methods can be coupled with optimal mapping technology todetermine haplotypes: (i) Restriction endonuclease digestion usingenzymes that cleave at polymorphic sites on the DNA segment to behaplotyped, (ii) addition of PNAs corresponding to polymorphic sites toform allele specific D-loops, (iii) addition of sequence specific DNAbinding proteins that recognize sequences that are polymorphic, and thatconsequently bind only to one set of alleles. The various types ofallele specific DNA binding proteins described above are all useful inthis aspect, however, the versatility in terms of sequence recognitionand high affinity binding of zinc finger proteins make them a preferredclass of DNA binding protiens. (iv) addition of other sequence specificbinding molecules described in this application.

[0255] A haplotyping method based on zinc fingers and optical mappingwould consist of the following steps: (i) prepare fixed, elongated DNAmolecules according to the methods of Schwartz, (ii) add zinc fingersthat recognize polymorphisms in a DNA segment to be haplotyped.Preferably the zinc fingers are synthesized with a detectable label, forexample by making a fusion protein, or alternatively they arepost-translationally labelled. Preferably different zinc fingers arelabelled (whether by making fusion proteins or by post-translationalchemical modification) with two or more different methods that result indetectable differences. Ideally at least two different labels are usedfor the zinc finger proteins. The reason that more than one type ofsignal is preferable is that when two or more zinc finger proteins arebound to a DNA molecule there will be a pattern to the detectablelabels. The pattern, as well as the distanace between the zinc fingerproteins, provides a signature that helps identify the DNA molecule towhich the proteins are bound.

[0256] (b) Atomic Force Microscopy

[0257] In another aspect of the invention, atomic force microscopy canbe used in a manner substantially similar to that described above foroptical mapping. That is, detectable structures can be formed atpolymorphic sites by addition of DNA binding proteins, preferably zincfinger proteins, or by forming other detectable complexes at polymorphicsites. Another method for forming detectable structures at polymorphicsites is strand invasion, preferably using PNA molecules. By appropriatedesign and optimization of PNA molecules an allele specific strandinvasion can be effected.

[0258] As with the haplotyping methods based on optical mapping, thehaplotyped molecules may be either PCR products or genomic DNAfragments.

[0259] C. ApoE Genotypes and Haplotypes

[0260] The Apolipoprotein E gene (ApoE) encodes a well studied protein(APOE) central to lipoprotein metabolism. The existence of three majorallelic forms of ApoE (referred to as e2, e3 and e4) has been known forover 2 decades. The well established three allele classification of ApoEis based on two polymorphisms in the coding sequence of the ApoE gene,both of which result in cysteine vs. arginine amino acid polymorphismsin APOE protein at positions 112 and 158 of the mature protein.DNA-based diagnostic tests for ApoE have been available since the 1980s.

[0261] The ApoE e4 allele has been consistently correlated with elevatedtotal cholesterol, elevated LDL cholesterol, low levels of ApoE proteinand increased risk of coronary heart disease (CHD). The CHD riskattributable to e4 is apparent even after correcting for cholesterollevels and other CHD risk factors (smoking obesity, diabetes, bloodpressure). The e4 allele is also a risk factor for late onsetAlzheimer's disease (AD), apparently due to effects on the rate ofdisease progression. Presence of the ApoE e4 allele also portends a poorprognosis for patients with a variety of other neurological diseases(stroke, brain trauma, amyotrophic lateral sclerosis and other diseases)and psychiatric diseases (e.g. schizophrenia), compared to patientswithout an e4 allele.

[0262] In addition to effects on disease risk and disease prognosisthere are reports that ApoE genotype predicts response of AD patients tomedications. In particular, the response of Alzheimer's disease patientsto acetylcholinesterase inhibitors has been studied by several groups.ApoE genotype may also be useful for predicting patient response toother medical treatments, particularly treatments for neurological andcardiovascular diseases.

[0263] However, despite the many genetic associations described above,diagnostic tests for determining ApoE genotype are not widely used, noris ApoE genotyping widely used for prognostic or pharmacogenetictesting. To the contrary, a large number of studies address thelimitations of ApoE as a diagnostic marker, particularly in the settingof AD diagnosis. The conclusion of most of these studies is that testingfor the e2, 23 and e4 alleles does not provide a sufficiently sensitiveor selective test to justify use outside of clinical research. Concernhas also been expressed that, because in many settings ApoE testingresults do not affect medical decision making, there is little reason toobtain information on ApoE genotype.

[0264] Recent studies of the ApoE gene in a number of laboratories haveled to identification of several new DNA polymorphisms. The biologicaleffects and medical import of these new polymorphisms has not beenestablished, although some studies suggest that polymorphisms in thepromoter affect ApoE transcription rates. Most published work has beenlimited to the analysis of individual polymorphisms or sets of only afew polymorphisms and their effect on one or two biological or clinicalendpoints.

[0265] In the present application we describe multiple previouslyunreported, polymorphisms in the ApoE gene. Also described are methodsfor determining the ApoE genotype and haplotype of unknown samples,e.g., using methods of this invention. These new genotyping andhaplotyping methods will enable more accurate measurement of thecontribution of variation in the entire ApoE gene (promoter, exons,introns and flanking DNA) to variation in serum cholesterol, CHD risk,AD risk, prognosis of patients with neurodegenerative diseases or braintrauma, responses of patients to various treatments and other medicallyimportant variables described herein. These improved ApoE tests mayprovide the degree of sensitivity and selectivity required forsuccessful development of diagnostic, prognostic or pharmacogenetictests for neurological, psychiatric or cardiovascular disease, eitheralone or in combination with genetic tests for other relevant genes.

[0266] Apolipoproteins are found on the surface of various classes oflipoproteins—membrane bound particles which transport lipids (mainlycholesterol and triglycerides) throughout the body, including the brain.The function of apolipoproteins is to direct lipoproteins to specificcells that require lipids, for example cells that store fat. Theapolipoproteins bind to specific receptors on the surface of lipidrequiring cells, thereby directing the transport of lipids to the targetcell. Apolipoprotein E is one of about a dozen apolipoproteins on bloodlipoproteins, but it is the major apolipoprotein in the brain.

[0267] One important function of ApoE in the brain is to transportlipids to cells that are performing membrane synthesis, which oftenoccurs as a response to acute or chronic brain injury. After injurythere is usually extensive synaptic remodeling as the surviving neuronsreceive new inputs from cells that were formerly wired to injured cells.This neuronal remodeling, or plasticity, is an important part of thephysiologic response to the disease process and modulates the course ofdisease. Patients with low ApoE levels or impaired ApoE function haveimpaired neuronal plasticity. In Alzheimer's disease one injured brainregion is the cholinergic pathways of the basal forebrain and elsewhere.The degree of neuronal remodeling in such areas may affect the responseto cholinomimetic therapy. Thus impaired brain lipid transport alterspatterns of neuronal remodeling in cholinergic (and other) pathways andthereby potentially affects response to acetylcholinesterase inhibitorsand possibly other cholinergic agonists.

[0268] The ApoE4 allele is a major risk for Alzheimer's disease, perhapsbecause it is expressed in brain at lower levels than the E2 or E3alleles, and thus impairs neuronal remodeling. The E2 allele is mildlyprotective for AD. Several clinical trials for Alzheimer's Diseasedrugs, including both acetylcholinesterase inhibitors andvasopressinergic agonists, have shown significant interactions with ApoEgenotype and sex. The E4 allele has been associated with lack ofresponse to acetylcholinesterases.

[0269] The relative risk of AD conferred by the E4 allele varies almostten-fold between different populations. The highest relative risk (RR)has consistently been reported in the Japanese, who have a 30-fold RR inE4/E4 homozygotes relative to E3/E3 homozygotes. African and HispanicE4/E4 homozygotes have relative risks of only ˜3-4-fold. On the otherhand, in the presence of an E4 allele the cumulative risk of AD to age90 is similar in all three groups (Japanese, Hispanics and Africans).This suggests that other factors contribute significantly to thecausation of AD in the non-Japanese populations. It may be that thesenon-E4 AD patients are the best responders to acetylcholinesterases. Iftrue, this may account for a lack of response in Japanese, where thefraction of patients with ApoE4 mediated AD appears to be the highest inthe world.

[0270] It is well established that the three common alleles at the ApoElocus are correlated with risk of AD in various populations. Recentstudies have also shown that ApoE genotype correlates with response ofAD patients to two classes of drugs. Specifically, Poirier et al.demonstrated an interaction of apoE genotype, sex and response of ADpatients to the cholinomimetic drug tacrine, while Richard, et al.showed an interaction between apoE genotype and response to aninvestigational noradrenergic/vasopressinergic agent, S12024. In bothstudies the analysis was restricted to analysis of the two amino acidvariances that determine the three common ApoE alleles. Other varianceshave been described at the ApoE locus, including promoter variances,that may affect ApoE function. Also, studies have been publishedassociating polymorphisms in other genes with risk of late onset AD.However, there have been no investigations of the effect of variation atthese loci on response to cholinomimetic drugs.

[0271] There are two FDA approved drugs for therapy of Alzheimer'sDisease (tacrine, donezepil), and at least a dozen additional agents inlate stage clinical trials or under FDA review. The FDA approved drugswork by inhibiting acetylcholinesterase, thereby boosting brainacetylcholine levels. This symptomatic therapy provides modest benefitto less than half of treated patients but does not affect diseaseprogression. Available evidence suggests the products in the pipeline,which likewise partially reverse symptoms without affecting theunderlying disease process, will also be of modest benefit to somepatients. Despite their limited efficacy, these drugs will be expensive.They will also likely be associated with serious adverse effects in somepatients. As a result, the cost of providing a modest benefit to alimited number of Alzheimer's Disease patients will be high.

[0272] As more AD therapeutics becomes available, physicians will facethe difficult task of differentiating between multiple products. Theseproducts may produce similar response rates in a population, however,the crucial decision clinicians face is selecting the appropriatetherapeutic for each individual AD patient at the time of diagnosis.This is particularly the case if there are several therapeutic choices,while only one of which may be optimal for a particular patient. Thisselection is critical because failure to provide optimal treatment atthe time of diagnosis may result in a diminished level of functionduring a period when the greatest benefit could be achieved. Inadequatetreatment may continue for some time because measures of clinicalresponse in AD are notoriously imprecise; six months or longer may passbefore it is clear whether a drug is working to a significant degree.During this time, the disease continues to progress which may limit theefficacy of a second drug or therapeutic regimen. A test that couldpredict likely responders to one or more AD drugs would thus be of greatvalue in optimizing patient care and reducing the cost of ineffectivetreatment.

[0273] Data has been published suggesting that ApoE genotype may be sucha test. Specifically, Farlow, Poirier and colleagues have shown thatfemale patients with the APOE ε4 allele do not respond to tacrine, whilefemale patients with the ε2 and ε3 alleles have significant response;males do not respond significantly regardless of genotype. Conversely,Richard et al. have demonstrated that patients with the ε4 allele, butnot the ε2 and ε3 alleles, have a statistically significant response toS12024, an enhancer of vasopressinergic/noradrenergic signalling. Thusthe two drugs—one an acetylcholinesterase inhibitor and the other avasopressinergic/noradrenergic agonist—are useful in different groups ofpatients, delimited by ApoE genotype.

[0274] The ability to predict response to therapy for progressivedebilitating diseases like AD would be of enormous clinical importanceas there is generally only one opportunity to treat patients with thesediseases at their maximal level of functioning; any delay in selectingoptimal therapy represents a lost opportunity to preserve the maximalpossible level of function. With multiple drugs in development for AD itwill become increasingly important to predict the best drug for eachpatient.

[0275] Screening the ApoE Gene for Variation

[0276] In order to better understand genetically encoded functionalvariation in the ApoE gene and its encoded product we systematicallycataloged genetic variation at the ApoE locus. The ApoE genomic sequenceis represented in GenBank accession AB012576. The gene is composed offour exons and three introns. The transcription start site (beginning offirst exon) is at nucleotide (nt) 18,371 of GenBank accession AB012576,while the end of the transcribed region (end of the 3′ untranslatedregion, less polyA tract) is at nt 21958 (Table 2).

[0277] We designed PCR primer pairs to cover the ApoE genomic sequencefrom nucleotides 16,382-23,984. Thus, our analysis began 1,989nucleotides upstream of the transcription start site, extended acrossthe entire gene and ended 2,026 nucleotides after the final exon. Thissegment of DNA was chosen to allow us to uncover any polymorphisms thatmight affect upstream, downstream or intragenic transcriptionalregulatory sequences, or that could alter transcribed sequences so as toaffect RNA processing (splicing, capping, polyadenylation), mRNA export,translation efficiency, mRNA half life, or interactions with mRNAregulatory factors, or that could affect amino acid coding sequences.

[0278] Separately, the ApoE cDNA was screened for polymorphism. The ApoEcDNA sequence was obtained from GenBank accession K00396, which covers1156 nt. Nucleotides 43 through 1129 were screened by DNA sequencing.

[0279] We also searched for polymorphisms in a putative ApoE enhancerelement located ˜15 kb 3′ of the end of the ApoE gene, in theexpectation that polymorphisms in a regulatory element might affect ApoElevels. The enhancer sequence is in the same GenBank accession as theApoE gene (AB012576). The segment screened for polymorphism extends fromnt 36,737 to 37,498.

[0280] Exemplary polymorphism screening methods are described in Example3. Briefly a panel of 32 subjects of varying geographic, racial andethnic backgroud were selected for screening.

[0281] A total of 20 polymorphic sites were identified, several of whichcorrespond to polymorphisms previously reported in the literature (seeTable 2). We also report unique haplotypes that have been observed withthese polymorphisms. Table 3 shows an analysis of the haplotypes presentin a subset of nine polymorphic sites. These haplotypes were determinedusing the methods described in detail in Example 1.

[0282] Table 4 provides the sequence of 42 additional halpolypes of theApoE gene. In any given haplotype, the ApoE sequence between the listednucleotides (e.g. between 16,541 and 16,747) is generally identical tothat in the GenBank AB012576, however there may be additionalpolymorphic sites not listed in this table. Such additional variantsites do not lessen the utility of the haplotypes provided. Where nosequence is provided at a particular site in a particular haplotype(e.g. position 18145 of haplotype 4) it is understood that either of thetwo nucleotides that appear elsewhere in the column (T or G under column18145) could appear at the indicated site.

[0283] Other haplotypes of the ApoE gene are shown in Table 5. In thistable a useful group of haplotypes is shown. These haplotypes arespecified by SNPs at positions 16747, 17030, 17785, 19311, and 23707 (asshown in rows 1-4 of the table) or by SNPs at a subset of the thesepositions: 17785, 19311, and 23707 (rows 5-8); 17030, 19311, and 23707(rows 9-12); 16747, 19311, and 23707 (rows 13-16); 17030, 17785, and23707 (rows 17-20); 16747, 17030, 19311, and 23707 (rows 21-24); or16747, 17785, 19311, and 23707 (25-28 of the table). One useful aspectof these haplotypes is that they closely parallel the classic phenotypesas indicated in the column on the far right. That is, the haplotypeGCAGC in row 1 identifies the alleles designated E3 by the classic ApoEtest; and GCAGA, in row 3, specify the alleles designated E4 by theclassic ApoE test; and GCAGA, in row 4, identifies the allelesdesignated E2 by the classic ApoE test. The haplotypes in rows 5-28 aresimpler versions of those in rows 1-4, with the corresponding classicApoE genotype/phenotypes indicated in the GENOTYPE column. It should benoted that the polymorphisms that specify the classic ApoE alleles areencoded by nucleotides 21250 (first position of codon 112 of the matureApoE protein) and 21388 (first position of codon 158) of the mature ApoEprotein). Nucleotides 21250 and 21388 are not elements of the haplotypesspecified in Table 4. In other words, the haplotypes in Table 4 arebased upon SNPs that are completely different from the SNPs that formthe basis of current ApoE allele classifications and genotype/haplotypetests. Thus, determining a haplotype or pair of haplotypes in a sampleby a method tha comprises examining any of the combinations of SNPsprovided in Table 4, below constitutes a novel method for determiningthe classic ApoE genotype/phenotype status of a sample.

[0284] Preferably, a haplotype or haplotypes specified in the Table 5are determined in conjunction with at least one additional ApoE SNPspecified herein (see Table 4). To constitute a new set of haplotypes.

[0285] Preferably, the at least one additional SNP (beyond those inTable 5) divides at lest one of the three classical ApoE phenotypes intotwo haplotype groups. For example, addition of the C/T polymorphism atnucleotide 21349 to the group in Table 5 divides the E3-like haplotypesinto two groups; those with C at 21349 and those with T at 21349.Addition of the T/C polymorphism at nucleotide 17937 to those in Table 5divides the E2-like haplotypes into two groups: those with a T at 17937and those with a C at 17937. Such subgroups are more likely tocorrespond to biologically and clinically homogeneous populations thanthe classic ε2, ε3,ε4 classification.

EXAMPLES Example 1 Haplotyping Method Using Hairpin Inducing Primers forAllele Specific PCR

[0286] A primer is designed which contains at least two differentregions. The 3′ portion of the primer corresponds to the template DNA tobe amplified. The length of this region of the primer can vary butshould be sufficient to impart the required specificity to result inamplification of only the region of cDNA or genomic DNA of interest.Additional nucleotides are added to the 5′ end of the primer which arecomplementary to the region in the sequence which contains thenucleotide variance. Following two rounds of PCR, the added tail regionof the primer is incorporated into the sequence. Incorporation of theadded nucleotides causes the reverse strand complementary to the primerstrand to form a hairpin loop if the correct nucleotide is present atthe site of variance. The hairpin loop structure inhibits annealing ofnew primers and thus further amplification.

[0287] Primers with the above characteristics were designed forhaplotyping of the dihydropyrimidine dehydrogenase (DPD) gene. The DPDgene has two sites of variance in the coding region at base 186 (T:C)and 597 (A:G) which result in amino acid changes of Cys:Arg and Met:Val,respectively (FIG. 27). The second site at base 597 is a restrictionfragment length polymorphism (RFLP) which cleaves with the enzyme BsrD Iif the A allele is present. Primers were designed which would result inamplification of one or the other allele depending which base waspresent at the site of variance at base 186 (FIG. 28). The bases inwhite correspond to the region of the primer which is complementary tothe DPD sequence. The green base is the variant base in the targetsequence. The bases in red and blue at bases added to the 5′ end of theprimer which should form a hairpin loop following incorporation into thePCR product. The blue base is the added base which hybridizes to thevariant base and is responsible for the allele discrimination of thehairpin loop. The DPDNSF primer contains only the DPD complementarysequence and will not result in allele specific amplification. In FIG.29 shows hybridization of the non-specific DPDNSF primer to both the Tand A allele of the DPD target sequence and the 5′ end of the PCRproduct generated by amplification using this primer. FIGS. 30 and 31are the corresponding diagrams as shown in FIG. 29, for primers DPDASTFand DPDASCF. Notice that the added bases are incorporated into the PCRfragment following amplification. FIG. 32 shows the most stable hairpinloop structures formed with the reverse strand of the PCR product madeusing the DPDNSF primer using the computer program Oligo4. Only thereverse strand is shown because this would be the strand to which theDPDNSF primer would hybridize on subsequent rounds of amplification. Thehairpin loops are either not stable or have a low melting temperature.FIGS. 33 and 34 are the corresponding diagrams for the hairpin loopsformed in the reverse strands of the PCR products generated usingprimers DPDASCF and DPDASTF, respectively. Amplification using primerDPDASCF of the T allele results in the ability to form a very stablehairpin loop with a melting temperature of 83° C. (FIG. 33). Incontrast, amplification of the C allele with primer DPDASCF generates ahairpin loop with a melting temperature of only 42° C. The converse istrue for the primer DPDASTF. Amplification of the C allele of DPDresults in the formation of a very stable hairpin loop (100° C.) whileamplification of the T allele results in the formation of a much lessstable hairpin (42° C.) (FIG. 34).

[0288]FIGS. 35, 36 and 37 depict the primer hybridization andamplification events when further amplification is attempted on thegenerated PCR fragments. The DPDNSF primer is able to effectivelycompete with the hairpin structures formed with both the T and C alleleof the DPD gene and thus amplification of both alleles proceedsefficiently (FIG. 35). The DPDASCF primer (FIG. 36) is able to competefor hybridization with the hairpin loop formed with the C allele becauseits melting temperature is higher than the hairpin loop's (60° C.compared to 42° C.). The hairpin loop formed on the T allele however,has a higher melting temperature than the primer and thus effectivelycompetes with the primer for hybridization. The hairpin loop inhibitsPCR amplification of the T allele which results in allele specificamplification of the C allele. The reverse is true for the primerDPDASTF (FIG. 37). The hairpin loop structure has a higher meltingtemperature than the primer for the C allele and a lower meltingtemperature than the primer for the T allele. This causes inhibition ofprimer hybridization and elongation on the C allele and results inallele specific amplification of the T allele.

[0289] The ability to use this for haplotyping is diagrammed in FIG. 28using a cDNA sample whose haplotype is know to be: Allele 1—T¹⁸⁶:A⁵⁹⁷,Allele 2—C¹⁸⁶:G⁵⁹⁷. The size of the fragments generated by a BsrD I froma 597 bp generated by amplification with the primers DPDNSF, DPDASTF,and DPDASCF, depend on whether the base at site 597 is an A or a G.Restriction digestion by BsrD I is indicative of the A base being atsite 597. If a fragment has the A base at 597, three fragments will begenerated of lengths 138, 164 and 267 bp. If the G base is at site 597only two fragments will be generated of lengths 164 and 405 bp. If asample is heterozygous for A and G at site 597, you will generate allfour bands of 138, 164 (2×), 267 and 405 bp. The expected fragmentsgenerated by BsrD I restriction for each of the primers is indicated inthe box in FIG. 38.

[0290]FIG. 39 shows a picture of an agarose gel run in which each of theprimers was used to amplify the cDNA sample heterozygous at both sites186 and 597 followed by BsrD I restriction. The DPDNSF lane shows therestriction fragment pattern for the selected cDNA using the DPDNSFprimer indicating that this sample is indeed heterozygous at site 597.However, using the same cDNA sample and the primer DPDASTF (DPDASTFlane), the restriction pattern correlates to the pattern representativeof a sample which is homozygous for A at site 597. Because the DPDASTFprimer allows amplification of only the T allele, the haplotype for thatin the sample must be T¹⁸⁶:A⁵⁹⁷. The restriction digest pattern usingthe primer DPDASCF (DPDASCF lane) correlates with the expected patternfor there being G at site 597. Amplification of the cDNA sample with theprimer DPDASCF results in amplification of only the C allele in thesample. Thus the haplotype for this allele must be C¹⁸⁶:G⁵⁹⁷. Thisdemonstrates that primers can be designed that will incorporate asequence into a PCR product which is capable of forming a hairpin loopstructure that will inhibit PCR amplification for one allele but not theother allele even if there is only a single base pair difference betweenthe two alleles. This can be exploited for allele specific amplificationand thus haplotyping of DNA samples.

[0291] Alternatively, it may also be possible to form a hairpinstructure at the 5′ end of the PCR product which is stable enough tokeep the polymerase from extending through the region. This may bepossible by incorporating into the primer modified nucleotides orstructures that when they hybridized to the correct base they would forma structure stable enough to inhibit read through by a polymerase.

[0292] This invention is meant to cover any method in which a stablesecondary structure is formed in one or both strands of a PCR productwhich inhibits further PCR amplification. The secondary structure isformed only when the correct base or bases are present at a known siteof variance. The secondary structure is not formed when the incorrectbase or bases are present in the PCR product at the site of varianceallowing further amplification of that product. This allows the specificamplification of one of the two possible alleles in a sample specificallowing the haplotyping of that allele.

Example 2 Genotyping of an ApoE Variance by Mass Spectrometry Analysisof Restriction Enzyme Generated Fragments

[0293] The following example describes the genotyping of the variance atgenomic site 21250 in the ApoE gene which is a T:C variance resulting ina cysteine to argininine amino acid change in amino acid 176 in theprotein. Two primers were designed to both amplify the target region ofthe ApoE gene and to introduce two resctriction enzyme sites (Fok I, FspI) into the amplicon adjacent to the site of variance FIG. 40 shows thesequence of the primers and the target DNA. The Apo21250-LFR primer isthe loop primer which contains the restriction enzyme recognition sitesand the ApoE21250-LR primer is the reverse primer used in the PCRamplification process. The polymorphic nucleotide is shown in italics.The following components were mixed together in a 200 μl PCR tube foreach genotyping reaction. All volumes are given in μl. A. 10x PCRxbuffer (Gibco/BRL, cat# 11509-015) 2 B. 2 mM dNTP mix 2 C. 50 mM MgSO₄0.8 D. PCR enhancer (Gibco/BRL, cat# 11509-015) 4 E. 20 μM ApoE21250-LFRprimer 1 F. 20 μM ApoE21250-LR primer 1 G. Patient genomic DNA 20 ng/ul0.5 H. Platinum Taq DNA polymerase (Gibco/BRL, cat# 11509-015) 0.1 I.deionized water 8.6

[0294] The reactions were cycled through the following steps in MJResearch FTC 200 thermocyclers: A. 94° C.  1 min. 1 cycle B. 94° C. 15sec. B-D 45 cycles C. 55° C. 15 sec. D. 72° C. 30 sec. E. 15° C.indefinitely hold

[0295] The sequence of the amplicon for both the T allele and the Callele following amplification is shown in FIG. 35. Five μl of eachreaction were removed and analyzed by agarose gel electrophoresis toensure the presence of sufficient PCR product of the correct size. Thefollowing components were mixed together for the restriction enzymecleavage of the DNA. Platinum Taq antibody (Taquench, Gibco/BRL cat#10965-010) was added to inhibit any potential filling in of the 3′recessed end created by Fok I cleavage. All volumes are in μl. A. 10xNew England Biolabs buffer #2 2 B. Fok I 4 units/μl (New EnglandBiolabs, cat# 109S) 0.3 C. Fsp I 5 units/μl (New England Biolabs, cat#135S) 0.2 D. Platinum Taq antibody (Gibco/BRL, cat# 11509-015) 0.2 E.PCR reaction 15 F. deionized water 2.4

[0296] The above reactions were incubated at 37° C. for 1 hour. Thecleavage sites for each amplicon are shown. Following incubation, thereactions were purified by solid phase extraction and eluted in a volumeof 100 μl of 70% acetonitrile water mix. The samples were dried in aSavant AES 2010 speed vac for 1 hour under vacuum and heat. The sampleswere resuspended in 3 μl matrix (65 mg/ml 3-hydroxy-picolinic acid, 40mM ammonium citrate, 50% acetonitrile) and spotted on the PerseptiveBiosystems 20×20 teflon coated plate. Samples were analyzed on thePerspertive Biosystems Voyager-DE Biospectrometry™ Workstation.

Example 3

[0297] Screening the ApoE Gene for Polymorphism

[0298] PCR primers were selected automatically by a computer programthat attempts to match forward and reverse primers in terms of GCcontent, melting temperature, and lack of base complementarity. Theparameters of the program were set to select primers approximately 500base pairs apart from each other, with at least 50 base pairs of overlapbetween adjacent PCR products. Primers were received in 96 wellmicrotiter plates, resuspended in sterilized deionized water at aconcentration of 5 pmoles/ul. PCR reactions were set up using aprogrammed Packard robot to pipet a master mix of 1×PCR buffer,polymerase and template into 96 well plates. Starting PCR conditionswere: 10 mM Tris (pH 8.3), 50 mM KCl, 1.5 mM MgCl₂, 0.2 mM dNTPs, 0.83uM forward and reverse primers, 0.7 Units of AmpliTaq Gold (PE Corp) and25 ng of genomic template, in a volume of 30 ul. Cycling was done on MJPTC200 PCR machines with the following cycle conditions: denature 12minutes at 95° C. followed by 35 cycles of: denature 15 seconds at 94°C., anneal 30 seconds at 60° C., extend 45 seconds at 72° C., followedby a ten minute extension at 72° C. PCR success was then tested byanalyzing products on 6% Long Ranger acrylamide gels. Products passed ifthey exhibited clean bands stronger than a 15 ng standard, with littleto no secondary amplification products. Efforts to optimize conditionsfor failed PCR products began with systematic variation of temperature,cosolvents (particularly PCR enhancer from GIBCO/BRL) and polymerase(Platinum Taq from GIBCO/BRL vs. AmpliTaq Gold). PCR products notoptimized by these modifications were discarded and one or two new PCRprimers were ordered and the process repeated until successful ampliconswere produced.

[0299] Optimized PCR primer pairs were used to perform DNA cyclesequencing using ABI BigDye DNA sequencing kits according toinstructions provided with the kits, except kit reagents were diluted1:8 and A, G, C and T reactions were set up robotically in a volume of20 ul.

[0300] Sequencing reactions were run on ABI 377 or ABI 3700 automatedDNA sequencing instruments. ABI 377 and ABI 3700 run times were similar,approximately 4 hours at approximately 5000 volts. Data was collectedautomatically using ABI collection software. The quality of DNAsequencing reactions was assessed automatically and numerically scoredusing the program PHRED. Only DNA sequence of quality level 30 or higherwas considered acceptable for analysis.

[0301] Raw sequencing reactions were then imported into a customdatabase and analyzed using PURED, PHRAP and POLYPHRED, and then theCONSED viewer was used to visually inspect the data and verifyvarainces. The custom database was used to track all samples in processand serve as a virtual notebook reference for all sample handling stepsas well as data generation, manipulation and presentation TABLE 1 Massdifferences between the nucleotides dATP, dCTP, dGTP, dTTP, and BrdUTP.dA dC dG dT BrdU dATP dCTP 24.0 dGTP 16.0 40.0 dTTP  9.0 15.0 25.0BrdUTP 55.8 79.8 39.8 64.8

[0302] TABLE 2 ApoE genomic sequence (GenBank accession AB012576) withpolymorphisms indicated (partial sequence of the accession) 14701ctggtggagc atctgatggg tgtttgggcc aagctggagc tttgtccatc ccctcttatt 14761tttctgcact tgactctctt atttttctga gactggtctc cctctgtcgc ccaggctaga 14821gtgcagcagt gcaactgcgg ctcactgcag cctccacctc ccgggctcaa gcagccttcc 14881cacctcagcc tcctgagtag ctaggaccac aggtgtatgc caccaggccc agctaatttt 14941tttgatagtt ttgggagaca tgggggtttc accatgttgc ccaggctggt ctcgaactcc 15001tggactcaag ccttggcctc ccaaagtgct gggattatag gtgtgagcca ccacacccag 15061ccagggtaga aggcactttg gaagcctcga gcctgcccca ttcatcttac gttagtggaa 15121actgaggctt ccagaggttt caaggtcaca actaaatcca gaacctcatc tcaggcacac 15181tggtcgtagt cccaatgtcc agtcttaagt cttcttggat atctgtggct cacagatttt 15241gggtgtttga gcctcctgct gagcactgct ggggccacag cggtgaccag ccctgtcttc 15301acgggactca gtgagaggaa cagattcatc cgcagagtgg qcaggactag gttgggggaa 15361cccaggggtc tagagggctt ttcagagggc aggggtcact gagcggagag cagaggagga 15421gtgagccatt tgctccagcg tgaagttgtt ggtgtgatgg ggtttcaggg tggcaggagc 15481agtgtggtta aaggtctgga agctgtcggc atgtggctgg tatccaaggt ggccaggaac 15541tctgcatgga tatggtggga agctggcacg cctctcacct cagctcttcc ctgcaggctc 15601tgtggatagc aactggatcg tgggtgccac gctggagaag aagctcccac ccctgcccct 15661gacactggcc cttggggcct tcctgaatca ccgcaagaac aagtttcagt gtggctttgg 15721cctcaccatc ggctgagccc tcctggcccc cgccttccac gcccttccga ttccacctcc 15781acctccacct ccccctgcca cagaggggag acctgagccc ccctcccttc cctcccccct 15841tgggggtcgg gggggacatt ggaaaggagg gaccccgcca ccccagcagc tgaggagggg 15901attctggaac tgaatggcgc ttcgggattc tgagtagcag gggcagcatg cccagtgggc 15961ctggggtccc gggagggatt ccggaattga ggggcacgca ggattctgag caccaggggc 16021agaggcggcc agacaacctc aggqaggagt gtcctggcgt ccccatcctc caaagggcct 16081gggcccgccc cgagggggca gcgagaggag cttccccatc cccggtcagt ccaccctgcc 16141ccgtccactt tcccatctcc tcggtataaa tcatgtttat aagttatgga agaaccggga 16201cattttacag aaaaaaaaca aaaaacaaca aaaaatatac gtgggaaaaa aaacgatggg 16261aggcctccgt tttctcaagt gtgtctggcc tgttttgagc atttcatccg gagtctggcc 16321gccctgacct tcccccagcc gcctgcaggg ggcgccagag ggccggagca cggaaagcag 16381cggatccttg atgctgcctt aagtccggct cagaggggcg cagcgtggcc tggggtcgct 16441atcttcccat ccggaacatc tgccctgctg ggggacacta cgggccttcc cttgcctgag                                          nt16541   * 16501 ggtagggtctcaaggtcact tgcccccagc ttgacctggc  ggagtggct atagaggact 16561 ttgtccctgcagactgcagc agcagagatg acactgtctc tgagtgcaga gatgggggca 16621 gggagctgggagagggttca agctactgga acagcttcag aacaactagg gtactaggaa 16681 ctgctgtgtcagggagaagg ggctcaagga ctcgcaggcc tgggaggagg ggcctaggcc    nt16747    *16741 agccat gga gttgggtcac ctgtgtctga ggacttggtg ctgtctggat tttgccaacc16801 tagggctggg gtcagctgat gcccaccacg actcccgagc ctccaggaac tgaaaccctg16861 tctgccccca gggtctgggg aaggaggctg ccgagtagaa ccaaccccag gttaccaacc                                              nt16965    * 16921ccacctcagc caccccttgc cagccaaagc aaacaggccc ggcc ggcac tgggggttcc                                                 nt17030    * 16981ttctcgaacc aggagttcag cctcccctga cccgcagaat cttctgatc cacccgctcc                                                           nt17098    *17041 aggagccagg aatgagtccc agtctctccc agttctcact gtgtggtttt gccattc tc17101 ttgctgctga accacgggtt tctcctctga aacatctggg atttataaca gggcttagga17161 aagtgacagc gtctgagcgt tcactgtggc ctgtccattg ctagccctaa cataggaccg17221 ctgtgtgcca gggctgtcct ccatgctcaa tacacgttag cttgtcacca aacatacccg17281 tgccgctgct ttcccagtct gatgagcaaa ggaacttgat gctcagagag gacaagtcat                                              nt17387    * 17341ttgcccaagg tcacacagct ggcaactggc agagccagga ttcacg cct ggcaatttga 17401ctccagaatc ctaaccttaa cccagaagca cggcttcaag cccctggaaa ccacaatacc 17461tgtggcagcc agggggaggt gctggaatct catttcacat gtggggaggg ggctcccctg 17521tgctcaaggt cacaaccaaa gaggaagctg tgattaaaac ccaggtccca tttgcaaagc 17581ctcgactttt agcaggtgca tcatactgtt cccacccctc ccatcccact tctgtccagc 17641cgcctagccc cactttcttt tttttctttt tttgagacag tctccctctt gctgaggctg 17701gagtgcagtg gcgagatctc ggctcactgt aacctccgcc tcccgggttc aagcgattct                        nt17785    * 17761 cctgcctcag cctcccaagt agctggatt acaggcgccc gccaccacgc ctggctaact                                                          nt17874    *17821 tttgtatttt tagtagagat ggggtttcac catgttggcc aggctggtct caa ctcctg                                                             nt17937   * 17881 accttaagtg attcgcccac tgtggcctcc caaagtgctg ggattacagg cgtgacacc 17941 gcccccagcc cctcccatcc cacttctgtc cagcccccta gccctactttctttctggga 18001 tccaggagtc cagatcccca gccccctctc cagattacat tcatccaggcacaggaaagg 18061 acagggtcag gaaaggagga ctctgggcgg cagcctccac attccccttccacgcttggc                         nt18145    * 18121 ccccagaatggaggagggtg tctg attac tgggcgaggt gtcctccctt cctggggact 18181 gtgggqggtggtcaaaagac ctctatgccc caccttcctc ctccctctgc cctgctgtgc 18241 ctggggcagggggagaacag cccacctcgt gactgggggc tggcccagcc cgccctatcc 18301 ctgggggagggggcgggaca gggggagccc tataattgga caagtctggg atccttgagt 18361 cctactcagcCCCAGCGGAG GTGAAGGACG TCCTTCCCCA GGAGCCGgtg agaagcgcag                                                        nt18476    *18421 tcgggggcac ggggatgagc tcaggggcct ctagaaagag ctgggaccct gggaa ccct18481 ggcctccagg tagtctcagg agagctactc ggggtcgggc ttggggagag gaggagcggg18541 ggtgaggcaa gcagcagggg actggacctg ggaagggctg ggcagcagag acgacccgac18601 ccgctagaag gtggggtggg gagagcagct ggactgggat gtaagccata gcaggactCC18661 acgagttgtc actatcattt atcgagcacc tactgggtgt ccccagtgtc ctcagatctc18721 cataactggg gagccagggg cagcgacacg gtagctagcc gtcgattgga gaactttaaa18781 atgaggactg aattagctca taaatggaac acggcgctta actgtgaggt tggagcttag18841 aatgtgaagg gagaatgagg aatgcgagac tgggactgag atggaaccgg cggtggggag18901 ggggtggggg gatggaattt gaaccccggg agaggaagat ggaattttct atggaggccg18961 acctggggat ggggagataa gagaagacca ggagggagtt aaatagggaa tgggttgggg19021 gcggcttggt aaatgtgctg ggattaggct gttgcagata atgcaacaag gcttggaagg19081 ctaacctggg gtgaggccgg gttggggccg ggctgggggt gggaggagtc ctcactggcg19141 gttgattgac agtttctcct tccccagACT GGCCAATCAC AGGCAGGAAG ATGAAGGTTC19201 TGTGGGCTGC GTTGCTGGTC ACATTCCTGG CAGG tatggg ggcggggctt gctcggttcc                                                     nt19311    * 19261ccccgctcct ccccctctca tcctcacctc aacctcctqg ccccattcag cagaccctg 19321ggccccctct tctgaggctt ctgtgctgct tcctggctct gaacagcgat ttgacgctct 19381ctgggcctcg gtttccccca tccttgagat aggagttaga agttgttttg ttgttgttgt 19441ttgttgttgt tgttttgttt ttttgagatg aagtctcgct ctgtcgccca ggctggagtg 19501cagtggcggg atctcggctc actgcaagct ccgcctccca ggtccacgcc attctcctgc 19561ctcagcctcc caagtagctg ggactacagg cacatgccac cacacccgac taactttttt 19621gtattttcag tagagacggg gtttcaccat gttggccagg ctggtctgga actcctgacc 19681tcaggtgatc tgcccgtttc gatctcccaa agtgctggga ttacaggcgt gagccaccgc 19741acctggccgg gagttagagg tttctaatgc attgcaggca gatagtgaat accagacacg 19801gggcagctgt gatctttatt ctccatcacc cccacacagc cctgcctggg gcacacaagg 19861acactcaata catgcttttc cgctgggcgc ggtggctcac ccctgtaatc ccagcacttt 19921gggaggccaa ggtgggagga tcacttgagc ccaqgagttc aacaccagcc tgggcaacat 19981agtgagaccc tgtctctact aaaaatacaa aaattagcca ggcatggtgc cacacacctg 20041tgctctcagc tactcaggag gctgaggcag gaggatcgct tgagcccaga aggtcaaggt 20101tgcagtgaac catgttcagg ccgctgcact ccagcctggg tgacagagca agaccctgtt 20161tataaataca taatgctttc caagtgatta aaccgactcc cccctcaccc tgcccaccat 20221ggctccaaag aagcattt.gt ggagcacctt ctgtgtgccc ctaggtacta gatgcctgga                                                nt20334 (A18T)    *20281 cggggtcaga aggaccctga cccaccttga acttgttcca cacaggATGC CAG CCAAGG20341 TGGAGCAAGC GGTGGAGACA GAGCCGGAGC CCGAGCTGCG CCAGCAGACC GAGTGGCAGA20401 GCGGCCAGCG CTGGGAACTG GCACTGGGTC GCTTTTGGGA TTACCTGCGC TGGGTGCAGA20461 CACTGTCTGA GCAGGTGCAG GAGGAGCTGC TCAGCTCCCA GGTCACCCAG GAACTGAGGt20521 gagtgtcccc atcctggccc ttgaccctcc tggtgggcgg ctatacctcc ccaggtccag20581 gtttcattct gcccctgtcg ctaagtcttg gggggcctgg gtctctgctg gttctagctt20641 cctcttccca tttctgactc ctggctttag ctctctggaa ttctctctct cagctttgtc20701 tctctctctt cccttctgac tcagtctctc acactcgtcc tggctctgtc tctgtccttc20761 cctagctctt ttatatagag acagagagat ggggtctcac tgtgttgccc aggctggtct20821 tqaacttctg ggctcaagcg atcctcccgc ctcggcctcc caaagtgctg ggattagagg20881 catgagccac cttgcccggc ctcctagctc cttcttcgtc tctgcctctg ccctctgcat20941 ctgctctctg catctgtctc tgtctccttc tctcggcctc tgccccgttc cttctctccc21001 tcttgggtct ctctggctca tccccatctc gcccgcccca tcccagccct tctccccgcc21061 tcccactgtg cgacaccctc ccgccctctc ggccgcagg G CGCTGATGGA CGAGACCATG21121 AAGGAGTTGA AGGCCTACAG ATCGGAACTG GAGGAACAAC TGACCCCGGT GGCGGAGGAG21181 ACGCGGGCAC GGCTGTCCAA GGAGCTGCAG GCGGCGCAGG CCCQGCTGGG CGCGGACATGnt21250 (C130R) 21241GAGGACGTG GCGGCCGCCT GGTGCAGTAC CGCGGCGAGG TGCAGGCCAT GCTCGGCCAG                                          nt21349 (R163C) 21301AGCACCGAGC AGCTGCGGGT GCGCCTCGCC TCCCACCTGC GCAAGCTG G TAAGCGGCTC                  nt21388 (R176C) 21361CTCCGCGATG CCGATGACCT GCAGAAG GC CTGGCAGTGT ACCAGGCCGG GGCCCGCGAG 21421GGCGCCGAGC GCGGCCTCAG CGCCATCCGC GAGCGCCTGG GGCCCCTGGT GGAACAGGGC 21481CGCGTGCGGG CCGCCACTGT GGGCTCCCTG GCCGGCCAGC CGCTACAGGA GCGGGCCCAG 21541GCCTGGGGCG AGCGGCTGCG CGCGCGGATG GAGGAGATGG GCAGCCGGAC CCGCGACCGC 21601CTGGACGAGG TGAAGGAGCA GGTGGCGGAG GTGCGCGCCA AGCTGGAGGA GCAGGCCCAG 21661CAGATACGCC TGCAGGCCGA GGCCTTCCAG GCCCGCCTCA AGAGCTGGTT CGAGCCCCTG 21721GTGGAAGACA TGCAGCGCCA GTGGGCCGGG CTGGTGGAGA AGGTGCAGGC TGCCGTGGGC 21781ACCAGCGCCG CCCCTGTGCC CAGCGACAAT CACTGA ACGC CGAAGCCTGC AGCCATGCGA 21841CCCCACGCCA CCCCGTGCCT CCTGCCTCCG CGCAGCCTGC AGCGGGAGAC CCTGTCCCCG 21901CCCCAGCCGT CCTCCTGGGG TGQACCCTAG TTTAATAAAG ATTCACCAAG TTTCACGCat 21961ctgctggcct ccccctgtga tttcctctaa gccccagcct cagtttctct ttctgcccac 22021atactggcca cacaattctc agccccctcc tctccatctg tgtctgtgtg tatctttctc 22081tctgcccttt tttttttttt tagacggagt ctggctctgt cacccaggct agagtgcagt 22141ggcacgatct tggctcactg caacctctgc ctcttgggtt caagcgattc tgctgcctca 22201gtagctggga ttacaggctc acaccaccac acccggctaa tttttgtatt tttagtagag 22261acgagctttc accatgttgg ccaggcaggt ctcaaactcc tgaccaagtg atccacccgc 22321cgqcctccca aagtgctgag attacaggcc tgagccacca tgcccggcct ctgcccctct 22381ttctttttta gggggcaggg aaaggtctca ccctgtcacc cgccatcaca gctcactgca 22441gcctccacct cctggactca agtgataagt gatcctcccg cctcagcctt tccagtagct 22501gagactacag gcgcatacca ctaggattaa tttggggggg gggtggtgtg tgtggagatg 22561gggtctggct ttgttggcca ggctgatgtg gaattcctgg gctcaagcga tactcccacc 22621ttggcctcct gagtagctga gactactggc tagcaccacc acacccagct ttttattatt 22681atttgtagag acaaggtctc aatatgttgc ccaggctagt ctcaaacccc tgggctcaag 22741agatcctccg ccatcggcct cccaaagtgc tgggattcca ggcatggggc tccgagcccg 22801gcctgcccaa cttaataata cttgttcctc agagttgcaa ctccaaatga cctgagattg 22921cctccagtct tatctgatat ctgcctcctt cccacccacc ctgcacccca tcccacccct 22981ctgtctctcc ctgttctcct caggagactc tggcttcctg ttttcctcca cttctatctt 23041ttatctctcc ctcctacggt ttcttttctt tctccccggc ctgcttgttt ctcccccaac 23101ccccttcatc tggatttctt cttctgccat tcagtttggt ttgagctctc tgcttctccg 23161gttccctctg agctagctgt cccttcaccc actgtgaact gggtttccct gcccaaccct 23221cattctcttt ctttctttct tttttttttt tttttttttt tttttttttt gagacagagt 23281cttgctctgt tgcccagcct ggagtgcagt ggtgcaatct tggttcactg caacctccac 23341ttcccagatt caagcaattc tcctgcctca gcctccagag tagctgggat tacaggcgtg 23401tcccaccaca cccgactaat tttztgtattt ttggtagaga caaggcttcg gcattgttgg 23461ccaggcaggt ctcgaactcc tgacctcaag taatctgcct gcctcaccct cccaaagtgc nt23524    * 23521 tgg attaca ggcatgagcc acctcacccg gaccatccctcattctccat cctttcctcc 23581 agttgtgatg tctacccctc atgtttccca acaagcctactgggtgctga atccaggctg 23641 ggaagagaag ggagcggctc ttctgtcgga gtctgcaccaggcccatgct gagacgagag     nt23707 *                                              nt23759     23701 ctggcgtca gagaggggaa gcttggatgg aagcccagga gccgccggca ctctcttc c                                              nt23805    * 23761ctcccacccc ctcagttctc agagacgggg aggagggtic ccac aacgg gggacaggct 23821gagacttgag cttgtatctc ctgggccagc tgcaacatct gcttgtccct ctgcccatct 23881tggctcctgc acaccctgaa cctggtgctt tccctggcac tgctctgatc acccacgtgg 23941aggcagcacc cctcccctgg agatgactca ccagggctga gtgaggaggg gaagggtcag 24001tgtgctcaca ggcagggggc ctggtctgct. gggcctgctg ctgattcacc gtatgtccag   BREAK 36601 catgcgttag gagggacatt tcaaactctt ttttacccta gactttcctaccatcaccca 36661 gagtatccag ccaggagggg aggggctaga gacaccagaa gtttagcagggaggagqgcg 36721 tagggattcg gggaatgaag ggatgggatt cagactaggg ccaqgacccagggatggaga 36781 gaaagagatg agagtggttt gggggcttgg tgacttagag aacagagctgcaggctcaga 36841 ggcacacagg agtttctggg ctcaccctgc ccccttccaa cccctcagttcccatcctcc 36901 agcagctgtt tgtgtgctgc ctctgaagtc cacactgaac aaacttcagcctactcatgt 36961 ccctaaaatg ggcaaacatt gcaagcagca aacagcaaac acacagccctccctgcctgc 37021 tgaccttgga gctggggcag aggtcagaga cctctctggg cccatgccacctccaacatc 37081 cactcgaccc cttggaattt cggtggagag gagcagaggt tgtcctggcgtggtttaggt 37141 agtgtgagag ggtccgggtt caaaaccact tgctgggtgg ggagtcgtcagtaagtggct                                      nt37237    * 37201atgccccgac cccgaagcct gtttccccat ctgtac atg gaaatgataa agacgcccat 37261ctgatagggt ttttgtggca aataaacatt tggttttttt gttttgtttt gttttgtttt 37321ttgagatgga ggtttgctct gtcgcccagg ctggagtgca gtgacacaat ctcatctcac 37381cacaaccttc ccctgcctca gcctcccaag tagctgggat tacaagcatg tgccaccaca 37441cctggctaat tttctatttt tagtagagac gggtttctcc atgttggtca gcctcagcct 37501cccaagtaac tgggattaca ggcctgtgcc accacacccg gctaattttt tctatttttg 37561acagggacgg ggtttcacca tgttggtcag gctggtctag aactcctgac ctcaaatgat 37621ccacccacct aggcctccca aagtgcacag attacaggcg tgggccaccg cacctggcca BREAK41821 aaaagatggt cttgtggggt aatgaaggac acaagcttgg tgggacctga gtccccaggc41881 tggcatagag ccccttactc cctgtgt

[0303] VGNX Symbol GEN- GEN- GEN- GEN- GEN- GEN- GEN- VGNX CBX CBX CBXCBX CBX PO GEN-PO CBX GEN- Database 1494 1557 1765 2096 29311 112 4484969 CBX GenBank 17874 17937 18145 18476 19311 20334 21250 21349 5008Amino Acid A(Silent)/ T(Silent)/ T(Silent)/ C(Silent)/ G(Silent)/G(Alanine)/ T(Cysteine)/ C(Arginine)/ 21388 C(Arginine)/ WWP# ChangeT(Silent) C(Silent) G(Silent) G(Silent) A(Silent) A(Threonine)C(Arginine) T(Cysteine) T(Cysteine) 1 Genotype A T [TG] G [GA] G [TC] CC Haplotype 1 A T G G G G T C C Haplotype 2 A T T G A G C C C 3 Genotype[AT] T G G G G T C C Haplotype 1 A T G G G G T C C Haplotype 2 T T G G GG T C C 5 Genotype [AT] T G G G G T C C Haplotype 1 T T G G G G T C CHaplotype 2 A T G G G G T C C 6 Genotype [AT] T G G G G T C C Haplotype1 A T G G G G T C C Haplotype 2 T T G G G G T C C 7 Genotype A T G G G GT C C Haplotype 1 A T G G G G T C C Haplotype 2 A T G G G G T C C 8Genotype A T T C G G T C C Haplotype 1 A T T C G G T C C Haplotype 2 A TT C G G T C C 11 Genotype [AT] T G G G G T C C Haplotype 1 A T G G G G TC C Haplotype 2 T T G G G G T C C 12 Genotype A T [TG] G G G T C CHaplotype 1 A T T G G G T C C Haplotype 2 A T G G G G T C C 13 GenotypeA T [TG] G [GA] G [TC] C C Haplotype 1 A T G G G G T C C Haplotype 2 A TT G A G C C C 14 Genotype A T [TG] [CG] G G T C C Haplotype 1 A T G G GG T C C Haplotype 2 A T T C G G T C C 15 Genotype A [TC] [TG] [CG] G G TC [CT] Haplotype 1 A C G G G G T C T Haplotype 2 A T T C G G T C C 16Genotype A T T C G G T C C Haplotype 1 A T T C G G T C C Haplotype 2 A TT C G G T C C 17 Genotype A T [TG] [CG] G G T C C Haplotype 1 A T G G GG T C C Haplotype 2 A T T C G G T C C 18 Genotype A T T C G [GA] T C CHaplotype 1 A T T C G G T C C Haplotype 2 A T T C G A T C C 19 GenotypeA T T C G G T C C Haplotype 1 A T T C G G T C C Haplotype 2 A T T C G GT C C 20 Genotype A T [TG] [CG] G G T C C Haplotype 1 A T G G G G T C CHaplotype 2 A T T C G G T C C 21 Genotype A T [TG] [CG] G G T C CHaplotype 1 A T G G G G T C C Haplotype 2 A T T C G G T C C 24 GenotypeA T [TG] [CG] G G T [CT] C Haplotype 1 A T G G G G T T C Haplotype 2 A TT C G G T C C 25 Genotype A T G G G G T C C Haplotype 1 A T G G G G T CC Haplotype 2 A T G G G G T C C 26 Genotype A T T C G G T C C Haplotype1 A T T C G G T C C Haplotype 2 A T T C G G T C C 27 Genotype A T [TG][CG] G G T C C Haplotype 1 A T G G G G T C C Haplotype 2 A T T C G G T CC 28 Genotype A C G G G G T C T Haplotype 1 A C G G G G T C T Haplotype2 A C G G G G T C T 29 Genotype A T G G G G T C C Haplotype 1 A T G G GG T C C Haplotype 2 A T G G G G T C C 30 Genotype A T [TG] C G G T C CHaplotype 1 A T T C G G T C C Haplotype 2 A T G C G G T C C 31 GenotypeA T [TG] [CG] G G T C C Haplotype 1 A T G G G G T C C Haplotype 2 A T TC G G T C C 32 Genotype A T [TG] C G G T C C Haplotype 1 A T T C G G T CC Haplotype 2 A T G C G G T C C 33 Genotype [AT] T [TG] [CG] G G T C CHaplotype 1 T T T C G G T C C Haplotype 2 A T G G G G T C C 34 GenotypeA T [TG] G [GA] G [TC] C C Haplotype 1 A T G G G G T C C Haplotype 2 A TT G A G C C C 35 Genotype A [TC] [TG] [CG] G G T C [CT] Haplotype 1 A CG G G G T C T Haplotype 2 A T T C G G T C C 36 Genotype A [TC] [TG] [CG]G G T C C Haplotype 1 A T G G G G T C C Haplotype 2 A C T C G G T C C 38Genotype [AT] T G G G G T C C Haplotype 1 A T G G G G T C C Haplotype 2T T G G G G T C C 39 Genotype [AT] [TC] [TG] [CG] G G T C [CT] Haplotype1 T T T C G G T C C Haplotype 2 A C G G G G T C T 40 Genotype A T [TG] GG G T C C Haplotype 1 A T T G G G T C C Haplotype 2 A T G G G G T C C 41Genotype [AT] T G [CG] G G [TC] C C Haplotype 1 T T G C G G T C CHaplotype 2 A T G G G G C C C 42 Genotype A [TC] [TG] [CG] G G T C CHaplotype 1 A T G G G G T C C Haplotype 2 A C T C G G T C C 44 GenotypeA C T C G G T C C Haplotype 1 A C T C G G T C C Haplotype 2 A C T C G GT C C 45 Genotype A T T [CG] [GA] G [TC] C C Haplotype 1 A T T C G G T CC Haplotype 2 A T T G A G C C C 46 Genotype [AT] T G G G G T C CHaplotype 1 A T G G G G T C C Haplotype 2 T T G G G G T C C 47 Genotype[AT] T [TG] G G G [TC] C C Haplotype 1 T T T G G G C C C Haplotype 2 A TG G G G T C C 48 Genotype [AT] [TC] [TG] [CG] G G T C [CT] Haplotype 1 TT T C G G T C C Haplotype 2 A C G G G G T C T 49 Genotype A T [TG] [CG]G G T C C Haplotype 1 A T G G G G T C C Haplotype 2 A T T C G G T C C 50Genotype A T T [CG] [GA] G [TC] C C Haplotype 1 A T T C G G T C CHaplotype 2 A T T G A G C C C 51 Genotype A [TC] [TG] G [GA] G [TC] C[CT] Haplotype 1 A C G G G G T C T Haplotype 2 A T T G A G C C C 52Genotype [AT] T G G G G T C C Haplotype 1 A T G G G G T C C Haplotype 2T T G G G G T C C 54 Genotype A [TC] [TG] [CG] G G T C C Haplotype 1 A TG G G G T C C Haplotype 2 A C T C G G T C C 58 Genotype [AT] T [TG] [CG]G G T C C Haplotype 1 T T T C G G T C C Haplotype 2 A T G G G G T C C 59Genotype A T G G G G T [CT] C Haplotype 1 A T G G G G T C C Haplotype 2A T G G G G T T C 60 Genotype [AT] T G G G G T C [CT] Haplotype 1 T T GG G G T C T Haplotype 2 A T G G G G T C C 61 Genotype [AT] T [TG] G [GA]G [TC] C [CT] Haplotype 1 T T G G G G T C T Haplotype 2 A T T G A G C CC 62 Genotype [AT] T G G G G T C [CT] Haplotype 1 T T G G G G T C THaplotype 2 A T G G G G T C C 63 Genotype A T T C G G T C C Haplotype 1A T T C G G T C C Haplotype 2 A T T C G G T C C 66 Genotype A [TC] G G GG T C [CT] Haplotype 1 A C G G G G T C T Haplotype 2 A T G G G G T C C67 Genotype A T [TG] [CG] G G T C C Haplotype 1 A T G G G G T C CHaplotype 2 A T T C G G T C C 68 Genotype [AT] T T [CG] [GA] G [TC] C CHaplotype 1 T T T C G G T C C Haplotype 2 A T T G A G C C C 69 GenotypeA T [TG] G [GA] G [TC] C C Haplotype 1 A T G G G G T C C Haplotype 2 A TT G A G C C C 70 Genotype [AT] T G G G G T C C Haplotype 1 A T G G G G TC C Haplotype 2 T T G G G G T C C 71 Genotype A T [TG] [CG] G G T C CHaplotype 1 A T G G G G T C C Haplotype 2 A T T C G G T C C 72 GenotypeA [TC] [TG] [CG] G G T C [CT] Haplotype 1 A C G G G G T C T Haplotype 2A T T C G G T C C 73 Genotype A T [TG] [CG] G G T C C Haplotype 1 A T GG G G T C C Haplotype 2 A T T C G G T C C 74 Genotype [AT] T [TG] [CG] GG T C C Haplotype 1 T T G G G G T C C Haplotype 2 A T T C G G T C C 75Genotype [AT] T [TG] [CG] G G T C C Haplotype 1 T T G G G G T C CHaplotype 2 A T T C G G T C C 78 Genotype [AT] T [TG] [CG] G G T [CT] CHaplotype 1 T T T C G G T C C Haplotype 2 A T G G G G T T C 79 Genotype[AT] T T G G G C C C Haplotype 1 T T T G G G C C C Haplotype 2 A T T G GG C C C 80 Genotype T T [TG] [CG] G G [TC] C C Haplotype 1 T T G G G G CC C Haplotype 2 T T T C G G T C C 81 Genotype A T G G G G T C CHaplotype 1 A T G G G G T C C Haplotype 2 A T G G G G T C C 84 GenotypeA T T C G G T C C Haplotype 1 A T T C G G T C C Haplotype 2 A T T C G GT C C 93 Genotype A T T C G G T C C Haplotype 1 A T T C G G T C CHaplotype 2 A T T C G G T C C 95 Genotype A T [TG] [CG] G G T C CHaplotype 1 A T G G G G T C C Haplotype 2 A T T C G G T C C 101 GenotypeA T [TG] [CG] G G T C C Haplotype 1 A T G G G G T C C Haplotype 2 A T TC G G T C C 102 Genotype A T T C G G T C C Haplotype 1 A T T C G G T C CHaplotype 2 A T T C G G T C C 109 Genotype [AT] T G G G G T C CHaplotype 1 A T G G G G T C C Haplotype 2 T T G G G G T C C 110 GenotypeA T [TG] [CG] G G T C C Haplotype 1 A T G G G G T C C Haplotype 2 A T TC G G T C C 111 Genotype A T G G G G T C C Haplotype 1 A T G G G G T C CHaplotype 2 A T G G G G T C C 112 Genotype A T G G G G T C C Haplotype 1A T G G G G T C C Haplotype 2 A T G G G G T C C 113 Genotype [AT] T [TG][CG] G G T C C Haplotype 1 T T T C G G T C C Haplotype 2 A T G G G G T CC 114 Genotype [AT] T [TG] G [GA] G [TC] C C Haplotype 1 T T G G G G T CC Haplotype 2 A T T G A G C C C 115 Genotype A T G G G G T C C Haplotype1 A T G G G G T C C Haplotype 2 A T G G G G T C C 116 Genotype [AT] T[TG] G G G C C C Haplotype 1 T T T G G G C C C Haplotype 2 A T G G G G CC C 117 Genotype [AT] T T [CG] G G [TC] C C Haplotype 1 T T T G G G C CC Haplotype 2 A T T C G G T C C 118 Genotype [AT] T T [CG] G G [TC] C CHaplotype 1 T T T G G G C C C Haplotype 2 A T T C G G T C C

[0304] TABLE 4 ApoE haplotypes # 16541 16747 16965 17030 17098 1738717785 17874 17937 18145 1 C G C C G C A A T T 2 C G C C G C A A T T 3 CG C C G C A A T T 4 C G C C G C A A T 5 C G C C G C A A T 6 C G C C G CA A T G 7 C G C C A C A A T 8 C G C C A C A A T G 9 C G C C A C A A T G10 C G T C A C A A T G 11 C G C C A C A T T G 12 C T C G G C G A T 13 CT C G G C G A T 14 C G T C A C A T G 15 C G C C C A A T 16 C G C C G C AA T 17 C G C C C A A T G 18 C G C C C A A T G 19 T G C C C A T 20 C C GG C A T 21 C T C G C A T 22 C C G G C A T 23 C G C C C A A T 24 C G C CT A T G 25 C G C C A T T G 26 C G T C A C A T G 27 C G C A C A T T G 28C G C A C A T T G 29 C G C C G C A A T T 30 C G C C G C A A T T 31 C G CC G C A T T 32 C G C C G C A T T 33 C G C C G C A T 34 C G C C G C A T35 C G C C G C A T 36 C G C C G T A T 37 C G C C G A T T 38 C G C C C AA C 39 C G C C C A A 40 C G C C A C A A 41 C G C C C A C 42 C G C C C A# 18476 19311 20334 21250 21349 21388 23524 23707 23759 23805 1 C G G TC C G C T C 2 C G A T C C G C T C 3 C G G T C C G C C C 4 C G G T C C AC T C 5 C G G T T C G C T C 6 C G G T C C G C T C 7 G G G T C C G C T C8 G G G T C C G C T C 9 G G G T C C G C C C 10 G G G T C C G C C C 11 GG G T C C G C T C 12 G G G C C C G C C C 13 G G A C C C G C C C 14 G G GT C C G C T C 15 G G G T C C G C C C 16 G G G T C C G C C 17 G G G T C GA C C 18 G G G T C T G C C 19 G G G T C C G C C C 20 G G C C G C C 21 GG C C G C C 22 G G C C G C C 23 G G T C C G C G 24 G G G C C G C C C 25G G G C C G C C C 26 G G T C C G C C 27 G G T C C G C C 28 G G T C C G CT C 29 A G C C G C C 30 G C C C G C C 31 G G C T G C C 32 G G C T G C C33 G A G C T G C C 34 G G C C T G C C 35 G G C T G A C C 36 G G T C G CC 37 G G T C G C C 38 G G T C G C 39 G G T C G A C 40 G G T C G C 41 G GT C G C 42 G G T C T G C

[0305] TABLE 5 One useful group of ApoE haplotypes. # 16747 17030 1778519311 23707 GENOTYPE 1 G C A G C E3-like 2 T G G G C E4-like 3 G C A A CE4-like 4 G C A G A E2-like 5 A G C E3-like 6 G G C E4-like 7 A A CE4-like 8 A G A E2-like 9 C G C E3-like 10 G G C E4-like 11 C A CE4-like 12 C G A E2-like 13 G G C E3-like 14 T G C E4-like 15 G A CE4-like 16 G G A E2-like 17 C A G C E3-like 18 G G G C E4-like 19 C A AC E4-like 20 C A G A E2-like 21 G C G C E3-like 22 T G G C E4-like 23 GC A C E4-like 24 G C G A E2-like 25 G A G C E3-like 26 T G G C E4-like27 G A A C E4-like 28 G A G A E2-like

[0306]

1 95 1 29 DNA Artificial Sequence Exemplary motif 1 cttgcccccagaatggatgc gcatgtctg 29 2 55 DNA Artificial Sequence Exemplary motif 2caccgcttgc ccccagaatg gaggagggtg tctgtattac tgggcgaggt gtcct 55 3 55 DNAArtificial Sequence Exemplary motif 3 aggacacctc gcccagtaat acagacaccctcctccattc tgggggcaag cggtg 55 4 50 DNA Artificial Sequence Exemplarymotif 4 cttgccccca gaatggatgc gcatgtctgt attactgggc gaggtgtcct 50 5 50DNA Artificial Sequence Exemplary motif 5 aggacacctc gcccagtaatacagacaccc tcctccattc tgggggcaag 50 6 37 DNA Artificial SequenceExemplary motif 6 cttgccccca gaatggagga ggatgcgcag gtgtctg 37 7 50 DNAArtificial Sequence Exemplary motif 7 acctcgccca gtaatacaga caccctcctccattctgggg gcaagcggtg 50 8 53 DNA Artificial Sequence Exemplary motif 8cttgccccca gaatggagga ggatgcgcag gtgtctgtat tactgggcga ggt 53 9 53 DNAArtificial Sequence Exemplary motif 9 acctcgccca gtaatacaga cacctgcgcatcctcctcca ttctgggggc aag 53 10 44 DNA Artificial Sequence Exemplarymotif 10 tggctggagt tgcgctagca agagtgcagc tgcaaaagga ttta 44 11 50 DNAArtificial Sequence Exemplary motif 11 cgcctatggc tggagttgcg ctagcaagaccaaaaggatt tataaacttc 50 12 50 DNA Artificial Sequence Exemplary motif12 gaagtttata aatccttttg gtcttgctag cgcaactcca gccataggcg 50 13 53 DNAArtificial Sequence Exemplary motif 13 tggctggagt tgcgctagca agacgtgcagctgcaaaagg atttataaac ttc 53 14 53 DNA Artificial Sequence Exemplarymotif 14 gaagtttata aatccttttg cagctgcacg tcttgctagc gcaactccag cca 5315 45 DNA Artificial Sequence Exemplary motif 15 tggctggagt tgcgctagcaagaccacagc tggatgaagg attta 45 16 50 DNA Artificial Sequence Exemplarymotif 16 cgcctatggc tggagttgcg ctagcaagac caaaaggatt tataaacttc 50 17 50DNA Artificial Sequence Exemplary motif 17 gaagtttata aatccttttggtcttgctag cgcaactcca gccataggcg 50 18 59 DNA Artificial SequenceExemplary motif 18 cgcctatggc tggagttgcg ctagcaagac cacagctggatgaaggattt ataaacttc 59 19 59 DNA Artificial Sequence Exemplary motif 19gaagtttata aatccttcat ccagctgtgg tcttgctagc gcaactccag ccataggcg 59 2053 DNA Artificial Sequence Exemplary motif 20 cttgccccca gaatggaggaggatgcgcag gtgtctgtat tactgggcga ggt 53 21 53 DNA Artificial SequenceExemplary motif 21 acctcgccca gtaatacaga cacctgcgca tcctcctccattctgggggc aag 53 22 34 DNA Artificial Sequence Exemplary motif 22cttgccccca gaatggagga ggatgcgcag gtgt 34 23 38 DNA Artificial SequenceExemplary motif 23 acagacacct gcgcatcctc ctccattctg ggggcaag 38 24 38DNA Artificial Sequence Exemplary motif 24 cttgccccca gaatggaggaggatgcgcag gtgtctgt 38 25 38 DNA Artificial Sequence Exemplary motif 25cttgccccca gaatggagga gagtcggatg ggtgtctg 38 26 50 DNA ArtificialSequence Exemplary motif 26 caccgcttgc ccccagaatg gaggagggtg tctgtattactgggcgaggt 50 27 50 DNA Artificial Sequence Exemplary motif 27acctcgccca gtaatacaga caccctcctc cattctgggg gcaagcggtg 50 28 54 DNAArtificial Sequence Exemplary motif 28 cttgccccca gaatggagga gagtcggatgggtgtctgta ttactgggcg aggt 54 29 54 DNA Artificial Sequence Exemplarymotif 29 acctcgccca gtaatacaga cacccatccg actctcctcc attctggggg caag 5430 33 DNA Artificial Sequence Exemplary motif 30 cttgccccca gaatggaggaggatgggtgt ctg 33 31 50 DNA Artificial Sequence Exemplary motif 31caccgcttgc ccccagaatg gaggagggtg tctgtattac tgggcgaggt 50 32 50 DNAArtificial Sequence Exemplary motif 32 acctcgccca gtaatacaga caccctcctccattctgggg gcaagcggtg 50 33 50 DNA Artificial Sequence Exemplary motif33 cttgccccca gaatggagga ggatggrgtg tctgtattac tgggcgaggt 50 34 49 DNAArtificial Sequence Exemplary motif 34 acctcgccca gtaatacaga cacccatcctcctccattct gggggcaag 49 35 22 DNA Artificial Sequence Exemplary motif 35atctggannn nnnnnnnnnt cc 22 36 26 DNA Artificial Sequence Exemplarymotif 36 atctggannn nnnnnnnnnt ccagat 26 37 26 DNA Artificial SequenceExemplary motif 37 atctggannn nnnnnnnnnt ccagat 26 38 22 DNA ArtificialSequence Exemplary motif 38 atctccannn nnnnnnnnnt cc 22 39 26 DNAArtificial Sequence Exemplary motif 39 atctccannn nnnnnnnnnt ccggat 2640 26 DNA Artificial Sequence Exemplary motif 40 atccggannn nnnnnnnnntccagat 26 41 22 DNA Artificial Sequence Exemplary motif 41 atccggannnnnnnnnnnnt cc 22 42 26 DNA Artificial Sequence Exemplary motif 42atccggannn nnnnnnnnnt ccagat 26 43 26 DNA Artificial Sequence Exemplarymotif 43 atctggannn nnnnnnnnnt ccggat 26 44 26 DNA Artificial SequenceExemplary motif 44 atctccannn nnnnnnnnnt ccggat 26 45 26 DNA ArtificialSequence Exemplary motif 45 atccggannn nnnnnnnnnt ccggat 26 46 26 DNAArtificial Sequence Exemplary motif 46 tagacctnnn nnnnnnnnna ggtcta 2647 26 DNA Artificial Sequence Exemplary motif 47 tagacctnnn nnnnnnnnnaggccta 26 48 26 DNA Artificial Sequence Exemplary motif 48 taggcctnnnnnnnnnnnna ggtcta 26 49 21 DNA Artificial Sequence Exemplary motif 49acacagactc atgcaactct g 21 50 21 DNA Artificial Sequence Exemplary motif50 acgcagactc atgcaactct g 21 51 15 DNA Artificial Sequence Exemplarymotif 51 actcatgcaa ctctg 15 52 35 DNA Artificial Sequence Exemplarymotif 52 actcatgcaa ctctgygttc cacttcggcc aagaa 35 53 35 DNA ArtificialSequence Exemplary motif 53 ttcttggccg aagtggaacr cagagttgca tgagt 35 5423 DNA Artificial Sequence Exemplary motif 54 gtggaacaca gagttgcatg agt23 55 23 DNA Artificial Sequence Exemplary motif 55 gtggaacgcagagttgcatg agt 23 56 23 DNA Artificial Sequence Exemplary motif 56actcatgcaa ctctgtgttc cac 23 57 23 DNA Artificial Sequence Exemplarymotif 57 actcatgcaa ctctgcgttc cac 23 58 29 DNA Artificial SequenceExemplary motif 58 acacagactc atgcaactct gtgttccac 29 59 29 DNAArtificial Sequence Exemplary motif 59 gtggaacaca gagttgcatg agtctgtgt29 60 29 DNA Artificial Sequence Exemplary motif 60 acacagactcatgcaactct gcgttccac 29 61 29 DNA Artificial Sequence Exemplary motif 61gtggaacgca gagttgcatg agtctgtgt 29 62 29 DNA Artificial SequenceExemplary motif 62 acgcagactc atgcaactct gtgttccac 29 63 29 DNAArtificial Sequence Exemplary motif 63 gtggaacaca gagttgcatg agtctgcgt29 64 29 DNA Artificial Sequence Exemplary motif 64 acgcagactcatgcaactct gcgttccac 29 65 29 DNA Artificial Sequence Exemplary motif 65gtggaacgca gagttgcatg agtctgcgt 29 66 22 DNA Artificial SequenceExemplary motif 66 gtggaacaca gagttgcatg ag 22 67 22 DNA ArtificialSequence Exemplary motif 67 gtggaacgca gagttgcatg ag 22 68 29 DNAArtificial Sequence Exemplary motif 68 gtggaacaca gagttgcatg agtctgtgt29 69 29 DNA Artificial Sequence Exemplary motif 69 gtggaacgcagagttgcatg agtctgtgt 29 70 29 DNA Artificial Sequence Exemplary motif 70gtggaacaca gagttgcatg agtctgcgt 29 71 29 DNA Artificial SequenceExemplary motif 71 gtggaacgca gagttgcatg agtctgcgt 29 72 23 DNAArtificial Sequence Exemplary motif 72 gtggaacgca gagttgcatg agt 23 7323 DNA Artificial Sequence Exemplary motif 73 gtggaacaca gagttgcatg agt23 74 30 DNA Artificial Sequence Exemplary motif 74 gtggaacggcagagttgcat gagtctgcgt 30 75 37 DNA Artificial Sequence Exemplary motif75 cccggctggg cgcggacatg ggatgcgcaa ggacgtg 37 76 58 DNA ArtificialSequence Exemplary motif 76 gcaggcccgg ctgggcgcgg acatggagga cgtgtgcggccgcctggtgc agtaccgc 58 77 58 DNA Artificial Sequence Exemplary motif 77gcggtactgc accaggcggc cgcacacgtc ctccatgtcc gcgcccagcc gggcctgc 58 78 58DNA Artificial Sequence Exemplary motif 78 ggcgaggtgc aggccatgctcggccagagc accgaggagc tgcgggtgcg cctcgcct 58 79 58 DNA ArtificialSequence Exemplary motif 79 aggcgaggcg cacccgcagc tcctcggtgc tctggccgagcatggcctgc acctcgcc 58 80 56 DNA Artificial Sequence Exemplary motif 80ccacctgcgc aagctgcgta agcggctcct ccgcgatgcc gatgacctgc agaagc 56 81 56DNA Artificial Sequence Exemplary motif 81 gcttctgcag gtcatcggcatcgcggagga gccgcttacg cagcttgcgc aggtgg 56 82 18 DNA Artificial SequenceExemplary motif 82 gcttctgcag gtcatcgg 18 83 58 DNA Artificial SequenceExemplary motif 83 cccggctggg cgcggacatg ggatgcgcaa ggacgtgtgcggccgcctgg tgcagtac 58 84 58 DNA Artificial Sequence Exemplary motif 84gtactgcacc aggcggccgc acacgtcctt gcgcatccca tgtccgcgcc cagccggg 58 85 58DNA Artificial Sequence Exemplary motif 85 cgcggcgagg tgcaggccatgctcggccag agcaccgagg agctgcgggt gcgcctcg 58 86 58 DNA ArtificialSequence Exemplary motif 86 cgaggcgcac ccgcagctcc tcggtgctct ggccgagcatggcctgcacc tcgccgcg 58 87 59 DNA Artificial Sequence Exemplary motif 87cctccacctg cgcaagctgc gtaagcggct cctccgcgat gccgatgacc tgcagaagc 59 8859 DNA Artificial Sequence Exemplary motif 88 gcttctgcag gtcatcggcatcgcggagga gccgcttacg cagcttgcgc aggtggagg 59 89 58 DNA ArtificialSequence Exemplary motif 89 cccggctggg cgcggacatg ggatgcgcaa ggacgtgcgcggccgcctgg tgcagtac 58 90 58 DNA Artificial Sequence Exemplary motif 90gtactgcacc aggcggccgc gcacgtcctt gcgcatccca tgtccgcgcc cagccggg 58 91 58DNA Artificial Sequence Exemplary motif 91 cgcggcgagg tgcaggccatgctcggccag agcaccgagg agctgcgggt gcgcctcg 58 92 58 DNA ArtificialSequence Exemplary motif 92 cgaggcgcac ccgcagctcc tcggtgctct ggccgagcatggcctgcacc tcgccgcg 58 93 59 DNA Artificial Sequence Exemplary motif 93cctccacctg cgcaagctgc gtaagcggct cctccgcgat gccgatgacc tgcagaagc 59 9459 DNA Artificial Sequence Exemplary motif 94 gcttctgcag gtcatcggcatcgcggagga gccgcttacg cagcttgcgc aggtggagg 59 95 10527 DNA Homo sapiens95 ctggtggagc atctgatggg tgtttgggcc aagctggagc tttgtccatc ccctcttatt 60tttctgcact tgactctctt atttttctga gactggtctc cctctgtcgc ccaggctaga 120gtgcagcagt gcaactgcgg ctcactgcag cctccacctc ccgggctcaa gcagccttcc 180cacctcagcc tcctgagtag ctaggaccac aggtgtatgc caccaggccc agctaatttt 240tttgatagtt ttgggagaca tgggggtttc accatgttgc ccaggctggt ctcgaactcc 300tggactcaag ccttggcctc ccaaagtgct gggattatag gtgtgagcca ccacacccag 360ccagggtaga aggcactttg gaagcctcga gcctgcccca ttcatcttac gttagtggaa 420actgaggctt ccagaggttt caaggtcaca actaaatcca gaacctcatc tcaggcacac 480tggtcgtagt cccaatgtcc agtcttaagt cttcttggat atctgtggct cacagatttt 540gggtgtttga gcctcctgct gagcactgct ggggccacag cggtgaccag ccctgtcttc 600acgggactca gtgagaggaa cagattcatc cgcagagtgg gcaggactag gttgggggaa 660cccaggggtc tagagggctt ttcagagggc aggggtcact gagcggagag cagaggagga 720gtgagccatt tgctccagcg tgaagttgtt ggtgtgatgg ggtttcaggg tggcaggagc 780agtgtggtta aaggtctgga agctgtcggc atgtggctgg tatccaaggt ggccaggaac 840tctgcatgga tatggtggga agctggcacg cctctcacct cagctcttcc ctgcaggctc 900tgtggatagc aactggatcg tgggtgccac gctggagaag aagctcccac ccctgcccct 960gacactggcc cttggggcct tcctgaatca ccgcaagaac aagtttcagt gtggctttgg 1020cctcaccatc ggctgagccc tcctggcccc cgccttccac gcccttccga ttccacctcc 1080acctccacct ccccctgcca cagaggggag acctgagccc ccctcccttc cctcccccct 1140tgggggtcgg gggggacatt ggaaaggagg gaccccgcca ccccagcagc tgaggagggg 1200attctggaac tgaatggcgc ttcgggattc tgagtagcag gggcagcatg cccagtgggc 1260ctggggtccc gggagggatt ccggaattga ggggcacgca ggattctgag caccaggggc 1320agaggcggcc agacaacctc agggaggagt gtcctggcgt ccccatcctc caaagggcct 1380gggcccgccc cgagggggca gcgagaggag cttccccatc cccggtcagt ccaccctgcc 1440ccgtccactt tcccatctcc tcggtataaa tcatgtttat aagttatgga agaaccggga 1500cattttacag aaaaaaaaca aaaaacaaca aaaaatatac gtgggaaaaa aaacgatggg 1560aggcctccgt tttctcaagt gtgtctggcc tgttttgagc atttcatccg gagtctggcc 1620gccctgacct tcccccagcc gcctgcaggg ggcgccagag ggccggagca cggaaagcag 1680cggatccttg atgctgcctt aagtccggct cagaggggcg cagcgtggcc tggggtcgct 1740atcttcccat ccggaacatc tgccctgctg ggggacacta cgggccttcc cttgcctgag 1800ggtagggtct caaggtcact tgcccccagc ttgacctggc cggagtggct atagaggact 1860ttgtccctgc agactgcagc agcagagatg acactgtctc tgagtgcaga gatgggggca 1920gggagctggg agagggttca agctactgga acagcttcag aacaactagg gtactaggaa 1980ctgctgtgtc agggagaagg ggctcaagga ctcgcaggcc tgggaggagg ggcctaggcc 2040agccatggga gttgggtcac ctgtgtctga ggacttggtg ctgtctggat tttgccaacc 2100tagggctggg gtcagctgat gcccaccacg actcccgagc ctccaggaac tgaaaccctg 2160tctgccccca gggtctgggg aaggaggctg ctgagtagaa ccaaccccag gttaccaacc 2220ccacctcagc caccccttgc cagccaaagc aaacaggccc ggcccggcac tgggggttcc 2280ttctcgaacc aggagttcag cctcccctga cccgcagaat cttctgatcc cacccgctcc 2340aggagccagg aatgagtccc agtctctccc agttctcact gtgtggtttt gccattcgtc 2400ttgctgctga accacgggtt tctcctctga aacatctggg atttataaca gggcttagga 2460aagtgacagc gtctgagcgt tcactgtggc ctgtccattg ctagccctaa cataggaccg 2520ctgtgtgcca gggctgtcct ccatgctcaa tacacgttag cttgtcacca aacatacccg 2580tgccgctgct ttcccagtct gatgagcaaa ggaacttgat gctcagagag gacaagtcat 2640ttgcccaagg tcacacagct ggcaactggc agagccagga ttcacgccct ggcaatttga 2700ctccagaatc ctaaccttaa cccagaagca cggcttcaag cccctggaaa ccacaatacc 2760tgtggcagcc agggggaggt gctggaatct catttcacat gtggggaggg ggctcccctg 2820tgctcaaggt cacaaccaaa gaggaagctg tgattaaaac ccaggtccca tttgcaaagc 2880ctcgactttt agcaggtgca tcatactgtt cccacccctc ccatcccact tctgtccagc 2940cgcctagccc cactttcttt tttttctttt tttgagacag tctccctctt gctgaggctg 3000gagtgcagtg gcgagatctc ggctcactgt aacctccgcc tcccgggttc aagcgattct 3060cctgcctcag cctcccaagt agctaggatt acaggcgccc gccaccacgc ctggctaact 3120tttgtatttt tagtagagat ggggtttcac catgttggcc aggctggtct caaactcctg 3180accttaagtg attcgcccac tgtggcctcc caaagtgctg ggattacagg cgtgactacc 3240gcccccagcc cctcccatcc cacttctgtc cagcccccta gccctacttt ctttctggga 3300tccaggagtc cagatcccca gccccctctc cagattacat tcatccaggc acaggaaagg 3360acagggtcag gaaaggagga ctctgggcgg cagcctccac attccccttc cacgcttggc 3420ccccagaatg gaggagggtg tctgtattac tgggcgaggt gtcctccctt cctggggact 3480gtggggggtg gtcaaaagac ctctatgccc cacctccttc ctccctctgc cctgctgtgc 3540ctggggcagg gggagaacag cccacctcgt gactgggggc tggcccagcc cgccctatcc 3600ctgggggagg gggcgggaca gggggagccc tataattgga caagtctggg atccttgagt 3660cctactcagc cccagcggag gtgaaggacg tccttcccca ggagccggtg agaagcgcag 3720tcgggggcac ggggatgagc tcaggggcct ctagaaagag ctgggaccct gggaacccct 3780ggcctccagg tagtctcagg agagctactc ggggtcgggc ttggggagag gaggagcggg 3840ggtgaggcaa gcagcagggg actggacctg ggaagggctg ggcagcagag acgacccgac 3900ccgctagaag gtggggtggg gagagcagct ggactgggat gtaagccata gcaggactcc 3960acgagttgtc actatcattt atcgagcacc tactgggtgt ccccagtgtc ctcagatctc 4020cataactggg gagccagggg cagcgacacg gtagctagcc gtcgattgga gaactttaaa 4080atgaggactg aattagctca taaatggaac acggcgctta actgtgaggt tggagcttag 4140aatgtgaagg gagaatgagg aatgcgagac tgggactgag atggaaccgg cggtggggag 4200ggggtggggg gatggaattt gaaccccggg agaggaagat ggaattttct atggaggccg 4260acctggggat ggggagataa gagaagacca ggagggagtt aaatagggaa tgggttgggg 4320gcggcttggt aaatgtgctg ggattaggct gttgcagata atgcaacaag gcttggaagg 4380ctaacctggg gtgaggccgg gttggggccg ggctgggggt gggaggagtc ctcactggcg 4440gttgattgac agtttctcct tccccagact ggccaatcac aggcaggaag atgaaggttc 4500tgtgggctgc gttgctggtc acattcctgg caggtatggg ggcggggctt gctcggttcc 4560ccccgctcct ccccctctca tcctcacctc aacctcctgg ccccattcag gcagaccctg 4620ggccccctct tctgaggctt ctgtgctgct tcctggctct gaacagcgat ttgacgctct 4680ctgggcctcg gtttccccca tccttgagat aggagttaga agttgttttg ttgttgttgt 4740ttgttgttgt tgttttgttt ttttgagatg aagtctcgct ctgtcgccca ggctggagtg 4800cagtggcggg atctcggctc actgcaagct ccgcctccca ggtccacgcc attctcctgc 4860ctcagcctcc caagtagctg ggactacagg cacatgccac cacacccgac taactttttt 4920gtattttcag tagagacggg gtttcaccat gttggccagg ctggtctgga actcctgacc 4980tcaggtgatc tgcccgtttc gatctcccaa agtgctggga ttacaggcgt gagccaccgc 5040acctggctgg gagttagagg tttctaatgc attgcaggca gatagtgaat accagacacg 5100gggcagctgt gatctttatt ctccatcacc cccacacagc cctgcctggg gcacacaagg 5160acactcaata catgcttttc cgctgggcgc ggtggctcac ccctgtaatc ccagcacttt 5220gggaggccaa ggtgggagga tcacttgagc ccaggagttc aacaccagcc tgggcaacat 5280agtgagaccc tgtctctact aaaaatacaa aaattagcca ggcatggtgc cacacacctg 5340tgctctcagc tactcaggag gctgaggcag gaggatcgct tgagcccaga aggtcaaggt 5400tgcagtgaac catgttcagg ccgctgcact ccagcctggg tgacagagca agaccctgtt 5460tataaataca taatgctttc caagtgatta aaccgactcc cccctcaccc tgcccaccat 5520ggctccaaag aagcatttgt ggagcacctt ctgtgtgccc ctaggtacta gatgcctgga 5580cggggtcaga aggaccctga cccaccttga acttgttcca cacaggatgc caggccaagg 5640tggagcaagc ggtggagaca gagccggagc ccgagctgcg ccagcagacc gagtggcaga 5700gcggccagcg ctgggaactg gcactgggtc gcttttggga ttacctgcgc tgggtgcaga 5760cactgtctga gcaggtgcag gaggagctgc tcagctccca ggtcacccag gaactgaggt 5820gagtgtcccc atcctggccc ttgaccctcc tggtgggcgg ctatacctcc ccaggtccag 5880gtttcattct gcccctgtcg ctaagtcttg gggggcctgg gtctctgctg gttctagctt 5940cctcttccca tttctgactc ctggctttag ctctctggaa ttctctctct cagctttgtc 6000tctctctctt cccttctgac tcagtctctc acactcgtcc tggctctgtc tctgtccttc 6060cctagctctt ttatatagag acagagagat ggggtctcac tgtgttgccc aggctggtct 6120tgaacttctg ggctcaagcg atcctcccgc ctcggcctcc caaagtgctg ggattagagg 6180catgagccac cttgcccggc ctcctagctc cttcttcgtc tctgcctctg ccctctgcat 6240ctgctctctg catctgtctc tgtctccttc tctcggcctc tgccccgttc cttctctccc 6300tcttgggtct ctctggctca tccccatctc gcccgcccca tcccagccct tctccccgcc 6360tcccactgtg cgacaccctc ccgccctctc ggccgcaggg cgctgatgga cgagaccatg 6420aaggagttga aggcctacaa atcggaactg gaggaacaac tgaccccggt ggcggaggag 6480acgcgggcac ggctgtccaa ggagctgcag gcggcgcagg cccggctggg cgcggacatg 6540gaggacgtgt gcggccgcct ggtgcagtac cgcggcgagg tgcaggccat gctcggccag 6600agcaccgagg agctgcgggt gcgcctcgcc tcccacctgc gcaagctgcg taagcggctc 6660ctccgcgatg ccgatgacct gcagaagcgc ctggcagtgt accaggccgg ggcccgcgag 6720ggcgccgagc gcggcctcag cgccatccgc gagcgcctgg ggcccctggt ggaacagggc 6780cgcgtgcggg ccgccactgt gggctccctg gccggccagc cgctacagga gcgggcccag 6840gcctggggcg agcggctgcg cgcgcggatg gaggagatgg gcagccggac ccgcgaccgc 6900ctggacgagg tgaaggagca ggtggcggag gtgcgcgcca agctggagga gcaggcccag 6960cagatacgcc tgcaggccga ggccttccag gcccgcctca agagctggtt cgagcccctg 7020gtggaagaca tgcagcgcca gtgggccggg ctggtggaga aggtgcaggc tgccgtgggc 7080accagcgccg cccctgtgcc cagcgacaat cactgaacgc cgaagcctgc agccatgcga 7140ccccacgcca ccccgtgcct cctgcctccg cgcagcctgc agcgggagac cctgtccccg 7200ccccagccgt cctcctgggg tggaccctag tttaataaag attcaccaag tttcacgcat 7260ctgctggcct ccccctgtga tttcctctaa gccccagcct cagtttctct ttctgcccac 7320atactggcca cacaattctc agccccctcc tctccatctg tgtctgtgtg tatctttctc 7380tctgcccttt tttttttttt tagacggagt ctggctctgt cacccaggct agagtgcagt 7440ggcacgatct tggctcactg caacctctgc ctcttgggtt caagcgattc tgctgcctca 7500gtagctggga ttacaggctc acaccaccac acccggctaa tttttgtatt tttagtagag 7560acgagctttc accatgttgg ccaggcaggt ctcaaactcc tgaccaagtg atccacccgc 7620cggcctccca aagtgctgag attacaggcc tgagccacca tgcccggcct ctgcccctct 7680ttctttttta gggggcaggg aaaggtctca ccctgtcacc cgccatcaca gctcactgca 7740gcctccacct cctggactca agtgataagt gatcctcccg cctcagcctt tccagtagct 7800gagactacag gcgcatacca ctaggattaa tttggggggg gggtggtgtg tgtggagatg 7860gggtctggct ttgttggcca ggctgatgtg gaattcctgg gctcaagcga tactcccacc 7920ttggcctcct gagtagctga gactactggc tagcaccacc acacccagct ttttattatt 7980atttgtagag acaaggtctc aatatgttgc ccaggctagt ctcaaacccc tgggctcaag 8040agatcctccg ccatcggcct cccaaagtgc tgggattcca ggcatggggc tccgagcccg 8100gcctgcccaa cttaataata cttgttcctc agagttgcaa ctccaaatga cctgagattg 8160gtgcctttat tctaagctat tttcattttt tttctgctgt cattattctc ccccttctct 8220cctccagtct tatctgatat ctgcctcctt cccacccacc ctgcacccca tcccacccct 8280ctgtctctcc ctgttctcct caggagactc tggcttcctg ttttcctcca cttctatctt 8340ttatctctcc ctcctacggt ttcttttctt tctccccggc ctgcttgttt ctcccccaac 8400ccccttcatc tggatttctt cttctgccat tcagtttggt ttgagctctc tgcttctccg 8460gttccctctg agctagctgt cccttcaccc actgtgaact gggtttccct gcccaaccct 8520cattctcttt ctttctttct tttttttttt tttttttttt tttttttttt gagacagagt 8580cttgctctgt tgcccagcct ggagtgcagt ggtgcaatct tggttcactg caacctccac 8640ttcccagatt caagcaattc tcctgcctca gcctccagag tagctgggat tacaggcgtg 8700tcccaccaca cccgactaat ttttgtattt ttggtagaga caaggcttcg gcattgttgg 8760ccaggcaggt ctcgaactcc tgacctcaag taatctgcct gcctcaccct cccaaagtgc 8820tgggattaca ggcatgagcc acctcacccg gaccatccct cattctccat cctttcctcc 8880agttgtgatg tctacccctc atgtttccca acaagcctac tgggtgctga atccaggctg 8940ggaagagaag ggagcggctc ttctgtcgga gtctgcacca ggcccatgct gagacgagag 9000ctggcgctca gagaggggaa gcttggatgg aagcccagga gccgccggca ctctcttctc 9060ctcccacccc ctcagttctc agagacgggg aggagggttc ccaccaacgg gggacaggct 9120gagacttgag cttgtatctc ctgggccagc tgcaacatct gcttgtccct ctgcccatct 9180tggctcctgc acaccctgaa cttggtgctt tccctggcac tgctctgatc acccacgtgg 9240aggcagcacc cctcccctgg agatgactca ccagggctga gtgaggaggg gaagggtcag 9300tgtgctcaca ggcagggggc ctggtctgct gggcctgctg ctgattcacc gtatgtccag 9360catgcgttag gagggacatt tcaaactctt ttttacccta gactttccta ccatcaccca 9420gagtatccag ccaggagggg aggggctaga gacaccagaa gtttagcagg gaggagggcg 9480tagggattcg gggaatgaag ggatgggatt cagactaggg ccaggaccca gggatggaga 9540gaaagagatg agagtggttt gggggcttgg tgacttagag aacagagctg caggctcaga 9600ggcacacagg agtttctggg ctcaccctgc ccccttccaa cccctcagtt cccatcctcc 9660agcagctgtt tgtgtgctgc ctctgaagtc cacactgaac aaacttcagc ctactcatgt 9720ccctaaaatg ggcaaacatt gcaagcagca aacagcaaac acacagccct ccctgcctgc 9780tgaccttgga gctggggcag aggtcagaga cctctctggg cccatgccac ctccaacatc 9840cactcgaccc cttggaattt cggtggagag gagcagaggt tgtcctggcg tggtttaggt 9900agtgtgagag ggtccgggtt caaaaccact tgctgggtgg ggagtcgtca gtaagtggct 9960atgccccgac cccgaagcct gtttccccat ctgtacaatg gaaatgataa agacgcccat 10020ctgatagggt ttttgtggca aataaacatt tggttttttt gttttgtttt gttttgtttt 10080ttgagatgga ggtttgctct gtcgcccagg ctggagtgca gtgacacaat ctcatctcac 10140cacaaccttc ccctgcctca gcctcccaag tagctgggat tacaagcatg tgccaccaca 10200cctggctaat tttctatttt tagtagagac gggtttctcc atgttggtca gcctcagcct 10260cccaagtaac tgggattaca ggcctgtgcc accacacccg gctaattttt tctatttttg 10320acagggacgg ggtttcacca tgttggtcag gctggtctag aactcctgac ctcaaatgat 10380ccacccacct aggcctccca aagtgcacag attacaggcg tgggccaccg cacctggcca 10440aaaagatggt cttgtggggt aatgaaggac acaagcttgg tgggacctga gtccccaggc 10500tggcatagag ccccttactc cctgtgt 10527

What is claimed is:
 1. A method of detecting the presence of a variancein a target nucleic acid sequence, comprising amplifying a templatenucleic acid sequence to generate an amplification product, wherein saidamplifying results in the insertion of a bifunctional restriction enzymerecognition site into said amplification product; and analyzing saidamplification product.
 2. The method of claim 1, wherein thebifunctional restriction enzyme recognition site is inserted by aprimer.
 3. The method of claim 2, wherein the primer contains a 5′ endcomplementary to said template nucleic acid, a 3′ end complementary tosaid template and said recognition site comprises a bifunctionalrestriction enzyme recognition site.
 4. The method of claim 2, whereinthe bifuntional restriction enzyme recognition site is inserted into theamplification product using a primer which results in the incorporationof a sequence of bases not present in or complementary to the template.5. The method of claim 1 further comprising the step of cleaving theamplification product with one or more restriction enzymes.
 6. Themethod of claim 5, wherein the restriction enzymes comprise at least onetype IIS restriction enzyme.
 7. A method of detecting the presence of avariance in a target nucleic acid molecule by mass spectrometrycomprising: a) proving said target nucleic acid molecule; b) providing afirst primer, wherein said first primer results in the insertion of arestriction enzyme recognition site in an amplification product; c)providing a second primer which is 100% complementary to the targetnucleic acid molecule; d) amplifying a portion of the target nucleicacid using said primers to generate said amplification product; e) e)cleaving amplification product with at least one restriction enzyme, aproduct of said cleaving being a nucleic acid fragment including theputative variance and being sufficiently small to be analyzed by massspectrometry; and f) analyzing said fragment by mass spectrometry. 8.The method of claim 7, wherein the restriction enzyme recognition siteis bifunctional.
 9. The method of claim 8, wherein the first primercomprises a 5′ end complementary to said target, a 3′ end complementaryto said target, and said recognition site comprised said bifunctionalrestriction enzyme recognition site.
 10. The method of claim 7, whereinsaid restriction enzymes comprise at least one type IIS restrictionenzyme.